OpenAI and Microsoft Sued by New York Times for Copyright Infringement
A negotiating tactic by a publisher with relatively weak leverage
That didn’t take long. Yesterday, we wrote about Anthropic’s new indemnity protection for users. That article concluded with the line:
2024 will be a year when generative AI goes to court.
Earlier this morning, the New York Times filed a lawsuit naming OpenAI and Microsoft over the unauthorized use of copyrighted material to train large language models (LLM). A New York Times article broke the story:
The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. The lawsuit, filed in Federal District Court in Manhattan, contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information.
The suit does not include an exact monetary demand. But it says the defendants should be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.” It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times.
Note that the company claimed “billions of dollars” in damage. While OpenAI, Anthropic, and other generative AI foundation model companies are out raising billions or tens of billions of dollars to fund model training, they may also need to set aside cash to pay for legal judgments or out-of-court settlements.
Generative AI a Threat to Journalism?
The lawsuit was filed in the Southern District of New York earlier today. A rationale of the charge was summarized in the suit’s first two clauses:
Independent journalism is vital to our democracy. It is also increasingly rare and valuable. For more than 170 years, The Times has given the world deeply reported, expert, independent journalism. Times journalists go where the story is, often at great risk and cost, to inform the public about important and pressing issues. They bear witness to conflict and disasters, provide accountability for the use of power, and illuminate truths that would otherwise go unseen. Their essential work is made possible through the efforts of a large and expensive organization that provides legal, security, and operational support, as well as editors who ensure their journalism meets the highest standards of accuracy and fairness. This work has always been important. But within a damaged information ecosystem that is awash in unreliable content, The Times’s journalism provides a service that has grown even more valuable to the public by supplying trustworthy information, news analysis, and commentary.
Defendants’ unlawful use of The Times’s work to create artificial intelligence products that compete with it threatens The Times’s ability to provide that service. Defendants’ generative artificial intelligence (“GenAI”) tools rely on large-language models (“LLMs”) that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more. While Defendants engaged in widescale copying from many sources, they gave Times content particular emphasis when building their LLMs—revealing a preference that recognizes the value of those works. Through Microsoft’s Bing Chat (recently rebranded as “Copilot”) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.
The Times is claiming that their business model is under threat. Despite strong recent financial performance, the rise of generative AI-powered chatbots could siphon off users from news organizations. It would be a significant blow if that leads to fewer subscriptions or lower advertising revenue. With that said, it is unclear if any harm exists today.
OpenAI has previously contested that its use of web data is covered under the fair use doctrine, which, among other things, hinges on whether the use of the information is transformative. For example, this article cites information from news articles but uses it within a transformative piece of analysis. The lawsuit addresses this point directly:
Publicly, Defendants insist that their conduct is protected as “fair use” because their unlicensed use of copyrighted content to train GenAI models serves a new “transformative” purpose. But "There is nothing 'transformative' about using The Times's content without payment to create products that substitute for The Times and steal audiences away from it," the Times said.
This will ultimately decide the case—if it is allowed to go that far. Google faced somewhat similar charges about Google Books nearly two decades ago. However, it prevailed in court. The Atlantic reported in 2015:
A federal circuit court made clear that Google Books is legal. A three-judge panel on the Second Circuit ruled decisively for the software giant against the Authors Guild, a professional group of published writers which had alleged Google’s scanning of library books and displaying of free “snippets” online violated its members’s copyright.
Negotiation by Other Means
The Google Books scenario is a key reason why it is unlikely this case will go to trial. The New York Times and other premium publishers cannot risk a trial with an uncertain outcome. It would be too costly if they were to lose. The Information’s Cory Weinberg put it this way:
But before you start dreaming of a high-profile trial with Sam Altman, Satya Nadella and A.G. Sulzberger taking the stand, you should view this for what it is: a negotiating tactic. It would be far too risky for The Times to go to trial over how the fair use doctrine—which allows limited use of copyrighted material—applies to artificial intelligence models. A court determining that OpenAI was operating legally would cut off The Times from getting a cut of the licensing revenue it seeks.
The New York Times admits that its goal has been to reach a negotiated solution. It just has not received a lucrative enough offer. It is unclear whether it has received any offer. According to the lawsuit:
The Times objected after it discovered that Defendants were using Times content without permission to develop their models and tools. For months, The Times has attempted to reach a negotiated agreement with Defendants, in accordance with its history of working productively with large technology platforms to permit the use of its content in new digital products (including the news products developed by Google, Meta, and Apple). The Times’s goal during these negotiations was to ensure it received fair value for the use of its content, facilitate the continuation of a healthy news ecosystem, and help develop GenAI technology in a responsible way that benefits society and supports a well-informed public.
These negotiations have not led to a resolution….
OpenAI has announced negotiated deals with The Associated Press and Axel Springer. It also announced a partner program in November. This means it has an active program and is cutting deals with publishers. The lawsuit suggests the offers are simply not lucrative enough for The Times to accept. In fact, it is trying to fill a revenue hole created by the collapse of its deal with Meta to feature its reporting for Facebook users. According to The Information:
[The New York Times] made headlines earlier this year when it struck a $100 million, three-year deal with Google to showcase Times stories in certain news products. But that barely offset The Times’ lost revenue from when Meta Platforms stopped paying the newspaper to feature the Times’ content in the Facebook app, securities filings show.
OpenAI and other generative AI companies represent an opportunity for more licensing revenue. However, the LLM developers don’t need content from every news publisher. Once it has a critical mass of high-quality news content for curated datasets, the value of each subsequent deal is bound to be lower unless there is a valuable niche, such as financial services, or it is a premium brand name.
The New York Times qualifies in the latter category, but it is by no means the only premium publisher available for partnership. Consider that The Washington Post and The Wall Street Journal provide coverage of many of the same stories as The Times, and you can see why OpenAI might not be eager to meet demands for large licensing fees.
I’m not convinced The New York Times is negotiating from a position of strength. There are two significant considerations in the dispute:
The use of New York Times articles as training data for an LLM
The use of New York Times article content in responses to user queries
The wild card variable in this scenario would be a court ruling that the use of copyrighted material as LLM training data is not covered under “fair use.” This could lead to substantial fees for any LLM developer that used copyrighted material in foundation model training and would provide publishers like The Times more negotiating leverage.
However, The Times is surely wary of losing in court like the book publishers. The Google Books conflict only covered the second consideration. News publishers could lose on both counts. That scenario would essentially guarantee zero licensing revenue flowing to publishers. So, this will require a balancing act if The New York Times is looking to maximize its risk-adjusted returns.
At the same time, OpenAI and Microsoft could also lose on both counts. This would offer The New York Times a much stronger negotiating position along with the potential for copyright infringement penalties. This means OpenAI and Microsoft may have a higher incentive to negotiate. Regardless, The New York Times is not negotiating in a vacuum. Its negotiations will be influenced by the deals that OpenAI is striking with The Times’ news reporting competitors.
Does The New York Times Have Leverage?
OpenAI may very well cut a deal with The New York Times. It is already doing deals with other publishers. However, it is unlikely to cave to overzealous demands from The Times. The content just isn’t so differentiated that OpenAI must have it in the same way that Spotify needs deals with each of the three major record labels.
If Spotify were to fail to reach an agreement with Universal Music, Warner Music, or Sony, it would leave a big hole in their music library. Customers would notice. This isn’t the case for news. There may be a difference in quality, but not a gap in availability.
This means The New York Times probably doesn’t have much leverage. If it does a deal with OpenAI, the company is more likely to be a terms taker than a terms maker. If OpenAI does a generous deal with The Times, it will be out of personal regard for the publishers as opposed to following market forces.
Why Microsoft?
If you are wondering why Microsoft is named in the lawsuit, it would appear to be a strategy to get another player in the negotiations that has a lower tolerance for risk than OpenAI, has deeper pockets, and is less willing to engage in an open conflict with a leading news organization. Microsoft reportedly owns 49% of OpenAI and uses its technology extensively. So, its inclusion is a solid legal strategy by The New York Times.
Then again, if Microsoft doesn’t want to settle, The Times just added a high-powered legal department to its adversary’s team. What could go wrong?
The Media industry is already on a steep decline in the United States with a two party democracy in very poor shape. Synthetic media and the impact of GPT technologies is likely the straw that will break the camel's back in terms of America's relationship with journalism, a free media and reading truthful sources of content.
We could be witnessing the end of Capitalism and Democracy as we know it as evidenced by the coming demographic winter. Generative A.I. might be a harbinger not of more profitability and productivity, but more monopoly capitalism and and fake digital eating our institutions like a free media. The NYT has a point, and OpenAI is not a good samaritan in its relationship with media.