Does generative artificial intelligence infringe copyright?

GENERATIVE ARTIFICIAL INTELLIGENCE (AI) will transform the workplace. The International Monetary Fund reckons that AI tools, which includes ones that produce text or images from written prompts, will eventually affect 40% of jobs. Goldman Sachs, a bank, says that the technology could replace 300m jobs worldwide. Sceptics say those estimates exaggerate. But some industries seem to be feeling the effects already. A paper published in August 2023 on SSRN, a repository for research which has yet to undergo formal peer review, suggests that the income of self-employed “creatives”—writers, illustrators and the like—has fallen since November 2022, when ChatGPT, a popular AI tool, was released.

Over the past year artists, authors and comedians have filed lawsuits against the tech companies behind AI tools, including OpenAI, Microsoft and Anthropic. The cases allege that, by using copyrighted material to train their AI models, tech firms have violated creators’ rights. Do those claims have merit?

AI generators translate written prompts—”draw a New York skyline in the style of Vincent van Gogh”, for example—into machine-readable commands. The models are trained on huge databases of text, images, audio or video. In many cases the tech firms appear to have scraped much of the material from the internet without permission. In 2022 David Holz, the founder of Midjourney, one of the most popular AI image generators, admitted that his tool had hoovered up 100m images without knowing where they came from or seeking permission from their owners.

Generators are supposed to make new output and on that basis AI developers argue that what their tools produce does not infringe copyright. They rely on the “fair-use doctrine”, which allows the use of copyrighted material in certain circumstances. This doctrine normally protects journalists, teachers, researchers and others when they use short excerpts of copyrighted material in their own work, for example in a book review. AI tools are not entitled to that protection, creatives believe, because they are in effect absorbing and rearranging copyrighted work rather than merely excerpting small pieces from it.

Generative AI is so new that there is almost no case law to guide courts. That makes the outcome of these cases hard to guess. Some observers reckon that many of the class-action suits against AI firms will probably fail. Andres Guadamuz, an expert in intellectual-property law at the University of Sussex, reckons that the strength of the fair-use doctrine is likely to trump claimants’ concerns.

One case will be particularly closely watched. On December 27th the New York Times sued Microsoft and OpenAI after negotiations failed. It alleges that the tech companies owe “billions of dollars” for using copyrighted work to train ChatGPT. The newspaper’s lawyers showed multiple examples of ChatGPT producing New York Times journalism word for word. This shows that AI tools do not substantially transform the material they’re trained on, and therefore are not protected by the fair-use doctrine, they claim.

On January 8th OpenAI responded, saying that it had done nothing wrong. Generative AI tools are pattern-matching technologies that write responses by predicting the likeliest next word based on what they have been trained on. As in other cases of this kind, OpenAI says that is covered by fair use. It claims that the New York Times overstates the risk of “regurgitation”, which it blames on a bug that produces errors only rarely. In a filing submitted on February 26th, OpenAI claimed that the New York Times cherry-picked answers from “tens of thousands” of queries it sent to the chatbot. Some of these were “deceptive prompts” that violated its terms of use, it alleged.

Creatives worry that if courts rule in favour of AI companies, their tools will replace human creativity. But developers say that the alternative is worse: if they had to stop training on copyrighted data, advanced AI models would not exist. There is a third way, one that Mr Guadamuz sees as the likeliest outcome of the New York Times case: AI developers may have to pay to license copyrighted training data. Whatever their outcome, lawsuits like these will shape the future of the technology. ■

Source link