AI copyright lawsuits are a warning to business users
OpenAI is facing copyright lawsuits raising questions such as whether AI model creators need permission before using copyrighted data, like author's books, to train models.
ChatGPT creator OpenAI has opened the doors to new ways of using artificial intelligence. But it's also facing increasing scrutiny through copyright lawsuits, Congressional hearings and a Federal Trade Commission investigation, which businesses using generative AI models should pay attention to, according to experts.
The Joseph Saveri Law Firm filed lawsuits against OpenAI in June and July on behalf of five book authors, including stand-up comedian Sarah Silverman, challenging ChatGPT and the underlying large language models (LLMs) powering it, GPT-3.5 and GPT-4. The lawsuits claim that when prompted, ChatGPT generates summaries of the authors' copyrighted works despite the authors not giving consent for using their books as training material for the LLMs.
They're not the first set of lawsuits challenging AI that generates content such as videos, images and text based on copyrighted material. The law firm has filed a lawsuit against GitHub Copilot, an AI coding assistant. It claims that GitHub Copilot used code from GitHub without giving credit to authors. The law firm also filed suit against Stable Diffusion, an AI image generator. The lawsuit alleges that Stable uses billions of digital images to create content without consent.
In the Silverman case, the lawsuit infers that the ChatGPT model couldn't have produced summaries of the author's books without having been trained on the copyrighted work. The Silverman case and other copyright suits are raising questions about how the underlying algorithms function, whether AI models infringe on copyrighted works, whether the model creators need to attain consent and how that might be possible given the way LLMs need large datasets to function.
The copyright suits are "important to set the framework" for AI models' use of protected work, said Mauricio Uribe, chair of the software/IT and electrical practice groups at law firm Knobbe Martens.
Copyright lawsuits could shape AI models' data use
Copyright is a bundle of rights provided to copyright owners to copy, distribute, perform, and make derivative works based on an original piece as well as the ability to grant others those rights. When AI and LLMs become involved, Uribe said law firms look at what aspect of those rights might be infringed.
LLMs could infringe on a copyrighted work by scraping the work itself and providing a copied version without permission, which is alleged in the Sarah Silverman lawsuit.
But the challenge facing copyright cases with LLMs is how such models can get permission from the billions of owners with copyrighted works and what sort of relief can be granted to copyright owners who do challenge LLMs using their works, Uribe said. It also raises the question of how diverse a dataset could be if AI models need permission for every piece of content it's trained on, Uribe said.
"What happens if one author out of 10 million pieces of literature that are ingested into the training set says you copied me without permission? What's the damages model with that, and what's the relief?" Uribe said. "I think this raises some very interesting questions."
Businesses should heed copyright lawsuits
The batch of lawsuits facing OpenAI relating to the sourcing of training data for LLMs was inevitable, Forrester Research analyst Rowan Curran said.
While lawsuits against AI companies like Stable Diffusion focus on AI-generated images, the authors' lawsuits challenging OpenAI highlight a growing number of questions about how LLMs gather and use data, particularly copyrighted data.
Curran said businesses should exercise caution when using models built on datasets from third party vendors. He said it's crucial that business leaders understand the "origin and provenance of that data."
Mauricio UribeChair, software/IT and electrical practice groups, Knobbe Martens
"Companies building applications and experiences with either predictive or generative AI must have clear guidance and governance," he said.
Uribe said users must understand the risks of using models that potentially infringe on copyright or other legal protections.
Uribe said, as an example, if a business asks an AI model to write code for its site, that could come against open source licensing agreements -- something GitHub Copilot is facing a lawsuit for.
"What happens if that software has an open source license with it? All of a sudden you're integrating software code into your own company that comes with these obligations you may or may not know about," Uribe said. "It's not just copyright, let alone if that software fails or it has a virus. That, to me, is the risk these companies need to at least think about."
FTC investigation adds to OpenAI scrutiny
On top of the copyright lawsuits, OpenAI is also facing an FTC investigation into whether the company engaged in unfair and deceptive practices related to data privacy and security as well as risks to consumers, such as reputational harm.
Generative AI is a new area for the FTC to explore, but the agency has maintained that it is capable of enforcing consumer protection laws against any AI system. While the nature of the investigation is likely exploratory given the newness of technology like generative AI, the FTC will be looking into possible harms, said Cobun Zweifel-Keegan, managing director of the International Association of Privacy Professionals.
"There's been a lot of discussion about harms to competition, which may be a big part of this," he said. "Harms to the competitive environment and using other companies' material in particular -- more IP copyright issues, from the company perspective, as well as claims from creators and others who think that the training models may have been used to ingest material that they created."
In response to the FTC's leaked investigation, OpenAI CEO Sam Altman said on the social media platform X -- formally known as Twitter -- that it's disappointing and "does not help build trust."
"That said, it's super important to us that our technology is safe and pro-consumer, and we are confident we follow the law," he said in the post.
Makenzie Holland is a news writer covering big tech and federal regulation. Prior to joining TechTarget, she was a general reporter for the Wilmington StarNews and a crime and education reporter at the Wabash Plain Dealer.