AI & Robotics News

Meet Nightshade, the new tool allowing artists to ‘poison’ AI models with corrupted training data

October 24, 2023

Since ChatGPT burst onto the scene nearly a year ago, the generative AI era has kicked into high gear, but so too has the opposition.

A number of artists, entertainers, performers and even record labels have filed lawsuits against AI companies, some against ChatGPT maker OpenAI, based on the “secret sauce” behind all these new tools: training data. That is, these AI models would not work without accessing large amounts of multimedia and learning from it, including written material and images produced by artists who had no prior knowledge, nor were given any chance to oppose their work being used to train new commercial AI products.

In the case of these AI model training datasets, many include material scraped from the web, a practice that artists previously by-and-large supported when it was used to index their material for search results, but which now many have come out against because it allows the creation of competing work through AI.

But even without filing lawsuits, artists have a chance to fight back against AI using tech. MIT Technology Review got an exclusive look at a new open source tool still in development called Nightshade, which can be added by artists to their imagery before they upload it to the web, altering pixels in a way invisible to the human eye, but that “poisons” the art for any AI models seeking to train on it.

Nightshade was developed by University of Chicago researchers under computer science professor Ben Zhao and will be added as an optional setting to their prior product Glaze, another online tool that can cloak digital artwork and alter its pixels to confuse AI models about its style.

In the case of Nightshade, the counterattack for artists against AI goes a bit further: it causes AI models to learn the wrong names of the objects and scenery they are looking at.

For example, the researchers poisoned images of dogs to include information in the pixels that made it appear to an AI model as a cat.

After sampling and learning from just 50 poisoned image samples, the AI began generating images of dogs with strange legs and unsettling appearances.

After 100 poison samples, it reliably generated a cat when asked by a user for a dog. After 300, any request for a cat returned a near perfect looking dog.

The researchers used Stable Diffusion, an open source text-to-image generation model, to test Nightshade and obtain the aforementioned results.

Thanks to the nature of the way generative AI models work — by grouping conceptually similar words and ideas into spatial clusters known as “embeddings” — Nightshade also managed to track Stable Diffusion into returning cats when prompted with the words “husky,” “puppy” and “wolf.”

Moreover, Nightshade’s data poisoning technique is difficult to defend against, as it requires AI model developers to weed out any images that contain poisoned pixels, which are by design, not obvious to the human eye and may be difficult even for software data scraping tools to detect.

Any poisoned images that were already ingested for an AI training dataset would also need to be detected and removed. If an AI model were already trained on them, it would likely need to be re-trained.

While the researchers acknowledge their work could be used for malicious purposes, their “hope is that it will help tip the power balance back from AI companies towards artists, by creating a powerful deterrent against disrespecting artists’ copyright and intellectual property,” according to the MIT Tech Review article on their work.

The researchers have submitted a paper their work making Nightshade for peer review to computer security conference Usinex, according to the report.

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More

Since ChatGPT burst onto the scene nearly a year ago, the generative AI era has kicked into high gear, but so too has the opposition.

A number of artists, entertainers, performers and even record labels have filed lawsuits against AI companies, some against ChatGPT maker OpenAI, based on the “secret sauce” behind all these new tools: training data. That is, these AI models would not work without accessing large amounts of multimedia and learning from it, including written material and images produced by artists who had no prior knowledge, nor were given any chance to oppose their work being used to train new commercial AI products.

In the case of these AI model training datasets, many include material scraped from the web, a practice that artists previously by-and-large supported when it was used to index their material for search results, but which now many have come out against because it allows the creation of competing work through AI.

But even without filing lawsuits, artists have a chance to fight back against AI using tech. MIT Technology Review got an exclusive look at a new open source tool still in development called Nightshade, which can be added by artists to their imagery before they upload it to the web, altering pixels in a way invisible to the human eye, but that “poisons” the art for any AI models seeking to train on it.

Event

AI Unleashed

An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.

Where Nightshade came from

Nightshade was developed by University of Chicago researchers under computer science professor Ben Zhao and will be added as an optional setting to their prior product Glaze, another online tool that can cloak digital artwork and alter its pixels to confuse AI models about its style.

In the case of Nightshade, the counterattack for artists against AI goes a bit further: it causes AI models to learn the wrong names of the objects and scenery they are looking at.

For example, the researchers poisoned images of dogs to include information in the pixels that made it appear to an AI model as a cat.

After sampling and learning from just 50 poisoned image samples, the AI began generating images of dogs with strange legs and unsettling appearances.

After 100 poison samples, it reliably generated a cat when asked by a user for a dog. After 300, any request for a cat returned a near perfect looking dog.

The poison drips through

The researchers used Stable Diffusion, an open source text-to-image generation model, to test Nightshade and obtain the aforementioned results.

Thanks to the nature of the way generative AI models work — by grouping conceptually similar words and ideas into spatial clusters known as “embeddings” — Nightshade also managed to track Stable Diffusion into returning cats when prompted with the words “husky,” “puppy” and “wolf.”

Moreover, Nightshade’s data poisoning technique is difficult to defend against, as it requires AI model developers to weed out any images that contain poisoned pixels, which are by design, not obvious to the human eye and may be difficult even for software data scraping tools to detect.

Any poisoned images that were already ingested for an AI training dataset would also need to be detected and removed. If an AI model were already trained on them, it would likely need to be re-trained.

While the researchers acknowledge their work could be used for malicious purposes, their “hope is that it will help tip the power balance back from AI companies towards artists, by creating a powerful deterrent against disrespecting artists’ copyright and intellectual property,” according to the MIT Tech Review article on their work.

The researchers have submitted a paper their work making Nightshade for peer review to computer security conference Usinex, according to the report.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Author: Carl Franzen
Source: Venturebeat
Reviewed By: Editorial Team

706

0