It would only take a very small number of changes in the databases used to train the artificial intelligences in order to mislead them. Researchers have found two ways to achieve this with most large databases in use today.
The arrival of chatbotschatbots as ChatGPTChatGPT raises fears that malicious individuals will use it to create scams, such as phishing, which are more numerous and more difficult to detect. However, these artificial intelligenceartificial intelligence also have their own weaknesses. Researchers from Google, ETH Zurich, Nvidia, and Robust Intelligence pre-published an article on arXiv which details two possible attacks against AI.
Both attacks consist of modifying, or “poisoning”, the information used to train them. The AI not being able to spot the fake newsfake news, it would be enough to modify a very small part of the data so that it produces erroneous results. A recent study puts the amount of false information needed to poison the entire model at 0.001%. Depending on the use of AI, the consequences could turn out to be dangerous.
The split-view poisoning attack
The first attack is called ” split-view poisoning ‘, which could be translated as poisoning by separate sight. The large databases used contain very many references to images, with a description. However, it is not the images themselves that are included, but links to download them from the web.
The problem is that very often the domain namesdomain names sites that host them have expired. This gives 0.29% of expired domain names for the LAION-2B-en model, which dates from 2022, a figure that rises to 6.48% for the 2010 PubFig model, still used today. The attack consists of buying some of these domain names to put other images in their place. Thus, the AI will be misled by the new images. Researchers reported that it is possible to poison 0.01% of LAION-400M or COYO-700M databases for only $60.
Le « frontrunning poisoning »
The second attack is called frontrunning poisoning “, which could be translated as “poisoning by anticipation”. Unlike the previous one, it targets all data, including text, but must be done before training the AI. Databases are created by taking snapshots of sources on the Web at a given time. If the attacker can predict when this snapshot will take place, he can modify the data just before. For example, he could edit pages on Wikipedia just before they are uploaded, so moderators don’t have time to correct misinformation. Based on the average reaction time to undo erroneous edits, the researchers estimated that it would be possible to poison up to 6.5% of Wikipedia’s data, in the absence of any other defensive measures (such as blocking of the IP address following a large number of changes).
The researchers have indicated several ways to defend against this kind of attack. For the split-view poisoning, it is often not possible to save the contents in the database because of copyrights. Instead, it would suffice to save a hash of the files that would allow any changes made later to be tracked. For the frontrunning poisoningthe researchers propose either to randomize the order of downloading of the pages in order to make poisoning more difficult, or simply to freeze the modifications on the site while creating a copy.