Artificial intelligence is finding its way into everything from cat flaps to ‘smart’ backyard grills – and of course, you can’t open any modern enterprise software without seeing some sort of AI assistant powered by a large language model (LLM). But as the technology becomes hard to avoid, perhaps we should give some thought to how people might abuse it.

We’re not talking about how cybercriminals might use large language models (LLMs) to write phishing emails or hack websites here. Rather, we’re considering how attackers could compromise legitimate AI systems to steal data, spread misinformation, or even send machines haywire.

The Vulnerabilities Lurking In LLMs

One of the most common such attacks involves prompt manipulation. Attackers have demonstrated how to circumvent various LLMs’ security guardrails (known as jailbreaking) using techniques like role play and even entering gibberish.

Prompt injections can do more than get an LLM to deliver instructions for illicit activities or write phishing emails. Researchers have used them for data exfiltration. For example, AI security company PromptArmor tricked Slack’s AI assistant into leaking secrets such as API keys from private channels.

Prompt engineering creates opportunities for data theft. AI systems can inadvertently expose sensitive data through bugs or design flaws. Sometimes these can be glitches, as when a ChatGPT bug leaked users’ private information, including payment details, in March 2023. Other attacks use prompt injection with sneaky tactics such as altering text so that a malicious prompt persuades an LLM to hand over data while being incomprehensible to human victims.

In some scenarios, researchers might be able to use prompt engineering to expose the model’s original training data. In a model inversion attack, an adversary can interrogate the LLM, using the responses to infer things about the training data and eventually reverse engineer some of that data after the fact.

Some have suggested using model inversion to extract close approximations of the images used to train facial recognition models. This risks identifying sensitive or vulnerable individuals or granting unauthorized access to resources.

It doesn’t just have to be text-based inputs that produce malicious results. Images and other data can also have adverse effects on AI. For example, researchers have forced self-driving cars to ignore stop signs by adding stickers to them and to see stop signs that aren’t there by projecting a few frames onto a billboard – both of which could have catastrophic results on the road.

Poisoning Upstream

Alternatively, attackers can tamper with AI workflows further upstream by poisoning the data that AI systems learn from. This can change the way the model behaves, polluting the end results. Some of these attacks are done for economic or political reasons. Researchers developed one tool, Nightshade, to help artists subtly change their digital images by inserting invisible pixels as a protest against LLMs training on copyrighted material. This causes image-generation programs to produce unpredictable results.

Data poisoning needn’t be widespread to have an effect, and when applied to specific datasets such as those used in medical systems, the results can be catastrophic. One study found that altering just 0.001% of training tokens with medical misinformation significantly increased the likelihood of medical errors.

As AI continues to permeate everyday life, the potential for system compromises to impact society increases. An astute attacker could do everything from creating disinformation to causing accidents on the road, affecting safety-critical decisions in areas such as medicine, or stopping AI from detecting fraudulent transactions.

Protecting AI Models

The possibilities for AI compromise are widespread enough – and their ramifications broad enough – that a multifaceted approach to AI governance is crucial. ISO 42001, an international standard for AI management systems, takes a holistic approach, including areas such as AI’s organizational context and leadership involvement. It also involves planning, support, operation, and ongoing evaluation and improvement. It dictates the development of technical specifications, including security and data quality, along with the documentation of security protocols to safeguard against threats like data poisoning and model inversion attacks.

Governments have moved to impose safety restrictions on AI. The EU’s AI Act mandates a conformity assessment for high-risk systems, which includes conforming with testing requirements that are still under development. In the US, the National Institute of Standards and Technology (NIST) already had an AI Risk Management Framework (RMF) before the Biden Administration published its Executive Order 14110 on AI safety in October 2023 (now rescinded by the Trump government). This called for a complementary generative AI risk management resource, which NIST published last June.

Unlike NIST’s AI RMF, ISO 42001 is certifiable. And whereas NIST focuses heavily on the safety and security of AI systems, ISO 42001 explores their role within a wider business context.

Why AI Governance Matters Now

Frameworks like these are becoming increasingly crucial as foundational LLM model providers race to provide new features that wow consumers. In doing so, they increase the AI models’ attack surface, enabling security researchers to find new exploits. For example, companies including OpenAI and Google have introduced long-term memory capabilities into their LLMs, allowing them to get to know users more intimately and deliver better results. This enabled researcher Johann Rehberger to use prompt injection that could plant false long-term memories in Google’s Gemini LLM.

It’s also worth exploring the security of AI models in the context of basic cyber-hygiene. In January 2025, researchers exposed a data breach at the Chinese-engineered foundational LLM DeepSeek, which captured the public’s imagination with its high performance. The cause of the data breach had nothing to do with prompt engineering, model inversion, or any magical AI capabilities; it stemmed from a publicly exposed cloud database containing chat histories and user details. In the exciting new world of AI, some of the most damaging vulnerabilities are depressingly old-school.