Poisoning the data well for Generative AI

The secret to generative AI’s success is data. Vast volumes of data that are used to train the large language models (LLMs) that underpin generative AI’s ability to answer complex questions and find and create new content. Good quality data leads to good outcomes. Bad, deliberately poisoned, or otherwise distorted data leads to bad outcomes.

As ever more organizations implement generative AI tools into their business systems, it’s important to reflect on what attackers can do to the data on which generative AI tools are trained.

Data poisoning

Data poisoning by malicious actors undermines the integrity of generative AI systems by corrupting their training data. This can significantly alter the behavior of AI models, leading to unreliable or harmful outputs.

A recent experiment that purposefully poisoned an LLM to teach it deceptive and malicious behavior inadvertently found that the model appeared permanently corrupted and couldn’t be corrected again.

Types of poisoning attacks

Inserting malware

Attackers can corrupt systems by inserting malware. There are numerous examples of this in the wild. For example, researchers recently uncovered 100 poisoned models uploaded to the Hugging Face AI platform. Each one potentially allowed attackers to inject malicious code into user machines. This is a form of supply chain compromise since these models are likely to be used as part of other systems.

Phishing

Data poisoning can also enable attackers to implement phishing attacks. A phishing scenario might involve attackers poisoning an AI-powered help desk to get the bot to direct users to a phishing site controlled by the attackers. If you then add API integrations, you have a scenario where attackers can easily exfiltrate any of the data they tricked the user into sharing with the chatbot.

Disinformation

Data poisoning can enable attackers to feed in disinformation to alter the model’s behavior. Poisoning the training data used during the creation of the LLM allows attackers to alter the way the model behaves when deployed. This can lead to a less predictable, more fallible model. It can lead to a model generating hate speech or conspiracy theories. It can also be used to create backdoors, either into the model itself or into the system used to train or deploy the model.

Installing backdoors

Data poisoning can covertly create a backdoor by hiding a payload in the LLM’s training set to be triggered later once the fully trained model is deployed. These attacks are not easy to implement, and they involve a deep understanding of how the model will use its training data when users interact and communicate with it.

Backdoors can allow attackers to exfiltrate some of the training data, or to impact the model’s core ‘prompting’ systems. This approach could enable attackers to stealthily introduce flaws or vulnerabilities that they return to later for exploitation. The attackers could, for instance, instruct the installed malware that if a certain code string is present in a file, that file should always be classed as benign or legitimate. The attackers could then compose any malware they want, and as long as they insert that code string into their file somewhere -- it gets through. This kind of corruption is particularly dangerous when it comes to AI-powered tools used in threat detection.

Retrieval Augmented Generation (RAG)

With RAG, a generative AI tool can retrieve data from external sources to address queries. Models that use a RAG approach are particularly vulnerable to poisoning. This is because RAG models often gather user feedback to improve response accuracy. Unless the feedback is screened, attackers can put in fake, deceptive, or potentially compromising content through the feedback mechanism.

Data manipulation attacks

Data manipulation attacks are similar to phishing or SQL injection attacks. This is a more straightforward type of attack against generative AI, as attackers only need to interact with the LLM rather than attempt to breach its training data.

The aim is to deceive the LLM into revealing confidential information, comparable to how a social engineering attack targets a human victim. Attackers will try subtle variations on requests or attempt to break the logic of the prompt database. This represents both a financial and reputational risk to companies. If the attackers access privileged information this can be used to sell or to extort their victims.

Best practice for AI security

Data manipulation poses a real risk and organizations deploying LLMs should ensure they have procedures in place to reinforce the security of the prompting process. This will prevent the models from accessing sensitive data if attackers break their protocols. Companies should create strict access policies for sensitive data before it is shared with an LLM, in case it could harm the company if it is leaked or stolen. 

Data poisoning is hard to detect and undermines the value and quality of the information provided by the model. It can also expose companies to data breaches if downloaded files contain a malicious payload.

Organizations deploying LLMs are responsible for vetting any new tools they implement and ensuring they always install updates and patches as soon as possible.

Detecting and addressing data poisoning is primarily an issue for LLM developers themselves. They have a duty of care to ensure all data input into models is accurate, reliable, and free of any malicious elements. This is especially critical as base models are often used as the foundation for multiple other tools with new prompts. One corrupted model could lead to countless others containing hidden flaws or malicious code.

Gabriel Moss is software engineer, Advanced Technology Group, Barracuda Networks Inc

Comments are closed.

© 1998-2025 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.