Artificial Intelligence (AI) and machine-learning experts are warning against the risk of data-poisoning attacks that can work against the large-scale datasets commonly used to train the deep-learning models in many AI services.
Data poisoning occurs when attackers tamper with the training data used to create deep-learning models. This action means it’s possible to affect the decisions that the AI makes in a way that is hard to track.
Also: These experts are racing to protect AI from hackers. Time is running out.
By secretly altering the source information used to train machine-learning algorithms, data-poisoning attacks have the potential to be extremely powerful because the AI will be learning from incorrect data and could make ‘wrong’ decisions that have significant consequences.
There’s currently no evidence of real-world attacks involving the poisoning of web-scale datasets. But now a group of AI and machine-learning researchers from Google, ETH Zurich, NVIDIA, and Robust Intelligence say they’ve demonstrated the possibility of poisoning attacks that “guarantee” malicious examples will appear in web-scale datasets that are used to train the largest machine-learning models.
“While large deep learning models are resilient to random noise, even minuscule amounts of adversarial noise in training sets (i.e., a poisoning attack) suffices to introduce targeted mistakes in model behavior,” the researchers warn.
Researchers said that by using the techniques they devised to exploit the way the datasets work, they could have poisoned 0.01% of prominent deep-learning datasets with little effort and at low cost. While 0.01% doesn’t sound like a lot of datasets, researchers warn that it’s “sufficient to poison a model”.
This attack is known as ‘split-view poisoning’. If an attacker could gain control over a web resource indexed by a particular dataset, they could poison the data that’s collected, making it inaccurate, with the potential to affect the whole algorithm negatively.
One way attackers can achieve this goal is by simply buying expired domain names. Domains expire on a regular basis and can then be bought by someone else — which is a perfect opportunity for a data poisoner.
“The adversary does not need to know the exact time at which clients will download the resource in the future: by owning the domain the adversary guarantees that any future download will collect poisoned data,” the researchers said.
Also: ChatGPT and more: What AI chatbots mean for the future of cybersecurity
The researchers point out that buying a domain and exploiting it for malicious purposes isn’t a new idea — cyber criminals use it to help spread malware. But attackers with different intentions could potentially poison an extensive dataset.
What’s more, researchers have detailed a second type of attack that they call front-running poisoning.
In this case, the attacker doesn’t have full control of the specific dataset — but they’re able to accurately predict when a web resource will be accessed for inclusion in a dataset snapshot. With this knowledge, the attacker can poison the dataset just before the information is collected.
Even if the information reverts to the original, non-manipulated form after just a few minutes, the dataset will still be incorrect in the snapshot taken when the malicious attack was active.
One resource that is heavily relied on for sourcing machine-learning training data is Wikipedia. But the nature of Wikipedia means that anyone can edit it — and according to researchers, an attacker “can poison a training set sourced from Wikipedia by making malicious edits”.
Wikipedia datasets don’t rely on the live page, but snapshots taken at a specific moment — which means attackers who time their intervention correctly could maliciously edit the page and force the model to collect inaccurate data, which will be stored in the dataset permanently.
“An attacker who can predict when a Wikipedia page will be scraped for inclusion in the next snapshot can perform poisoning immediately prior to scraping. Even if the edit is quickly reverted on the live page, the snapshot will contain the malicious content — forever,” wrote researchers.
The way Wikipedia uses a well-documented protocol for producing snapshots means that it’s possible to predict the snapshot times of individual articles with high accuracy. Researchers suggest it’s possible to exploit this protocol to poison Wikipedia pages with a success rate of 6.5%.
That percentage might not sound high, but the sheer number of Wikipedia pages and the way they’re used to train machine-learning datasets means it would be possible to feed inaccurate information to the models.
Also: The best password managers
Researchers note that they didn’t edit any live Wikipedia pages and that they notified Wikipedia about the attacks and the potential means of defending against them as part of the responsible disclosure process. ZDNET has contacted Wikipedia for comment.
The researchers also note that the purpose of publishing the paper is to encourage others in the security space to conduct their own research into how to defend AI and machine-learning systems from malicious attacks.
“Our work is only a starting point for the community to develop a better understanding of the risks involved in generating models from web-scale data,” the paper said.