October 23, 2023

Data Poisoning: The Newest Threat to Generative AI

Welcome to Forcepoint's 2024 Future Insights Series
Audra Simons

Perhaps nothing illustrates the rapid mainstreaming of machine learning and artificial intelligence (AI) more than the insane popularity of ChatGPT, a generative AI app that boasts the fastest-growing user base in history. But as algorithms become a staple of everyday life, thanks to a growing number of business and consumer use cases, they also represent a new attack surface. By manipulating algorithms, bad actors can control their output.

This type of attack, called data poisoning, is becoming more prolific as bad actors gain access to greater computing power and new tools. Although the first data poisoning attack took place more than 15 years ago, it has since become the most critical vulnerability in machine learning and AI. Google’s anti-spam filters, for example, have been compromised multiple times. Bad actors poisoned the algorithm and changed how spam was defined, causing malicious emails to bypass the filter.

Looking ahead to 2024, considering the popularity and uptake of new machine learning and AI tools, companies can expect to see an increase in data poisoning attacks. With that in mind, let’s take a closer look at this threat and how organizations can prepare for it.


Types of data poisoning attacks

Data poisoning attacks can be broken into four broad buckets: availability attacks, backdoor attacks, targeted attacks, and subpopulation attacks.

In an availability attack, the entire model is corrupted, causing false positives, false negatives, and misclassified test samples. A common instance of availability attacks is label flipping or adding approved labels to compromised data. Across the board, availability attacks result in a considerable reduction in model accuracy.

In a backdoor attack, an actor introduces backdoors (i.e. a set of pixels in the corner of an image) into a set of training examples, triggering the model to misclassify them and impacting the quality of the output.

With targeted attacks, as the name suggests, the model continues to perform well for most samples, but a small number are compromised, making it difficult to detect due to the limited visible impact to the algorithm.

Finally, subpopulation attacks, which are similar to targeted attacks in that they only impact specific subsets, influence multiple subsets with similar features while accuracy persists for the remainder of the model. Ultimately, when building any training algorithm, the vulnerabilities associated with these kinds of data poisoning attacks must all be considered.

Another way to categorize data poisoning attacks is by the attacker’s knowledge, as opposed to (or in addition to) their technique. When adversaries have no knowledge of the model, it’s known as a “black-box attack.” At the other extreme, when adversaries have full knowledge of the training and model parameters, it’s called a “white-box attack.” For a targeted attack to be carried out, for example, the attacker must have knowledge of the subset they wish to target during the model’s training period. A “grey-box attack,” finally, falls in the middle. Unsurprisingly, white-box attacks tend to be the most successful.


How to combat data poisoning

The unfortunate reality is that data poisoning is difficult to remedy. Correcting a model requires a detailed analysis of the model’s training inputs, plus the ability to detect and remove fraudulent ones. If the data set is too large, such analysis is impossible. The only solution is to retrain the model completely. But that’s hardly simple or cheap. Training GPT-3, for example, cost a whopping 16 million euros. As such, the best defense mechanisms against data poisoning are proactive.

To start, be extremely diligent about the databases being used to train any given model. Options include using high-speed verifiers and Zero Trust CDR to ensure data being transferred is clean; use statistical methods to detect anomalies in the data; and controlling who has access to the training data sets. Once the training phase is underway, continue to keep the models’ operating information secret. Additionally, be sure to continuously monitor model performance, using cloud tools such as Azure Monitor and Amazon SageMaker, to detect unexpected shifts in accuracy.


The bottom line

There are several ways for bad actors to exercise control over a model’s training, from inserting poisoned data to modifying existing training samples. As organizations use artificial intelligence and machine learning for a broader range of use cases, understanding and preventing such vulnerabilities is of the utmost importance. This is particularly the case as essential services like transportation and policing reap the benefits of such technologies. While generative AI has a long list of promising use cases, its full potential can only be realized if we keep adversaries out and models protected.

Audra Simons

Audra Simons is the Senior Director of  Global Products, G2CI. Audra is part of the Forcepoint Global Governments team, where her goal is to break new ground in the area of non-ITAR global products and engineering with a focus on high assurance critical infrastructure customers,...

Read more articles by Audra Simons

About Forcepoint

Forcepoint is the leading user and data protection cybersecurity company, entrusted to safeguard organizations while driving digital transformation and growth. Our solutions adapt in real-time to how people interact with data, providing secure access while enabling employees to create value.
Inline CSS for Main Menu