OpenAI Racks Up Data and Content Deals with News Corp, Vox, and The Atlantic

More high-quality data for OpenAI, less incentive to give into NYT demands

Jun 03, 2024

OpenAI closed three new data and content deals last week with news publishers NewsCorp, Vox, and The Atlantic. This follows earlier deals with The Associated Press, Axel Springer, LeMonde, Prisa, and The Financial Times. Each of the deals enables OpenAI to utilize the publishers’ content catalogs to train their AI foundation models. They also provide access to the current news via OpenAI’s ChatGPT. Publishers receive fees for access to their content and ensure relevance with consumers if generative AI chat assistants become an important channel for news consumption.

News Corp

The NewsCorp agreement is the most substantial in terms of content depth and breadth. According to the announcement:

OpenAI has permission to display content from News Corp mastheads in response to user questions and to enhance its products…OpenAI will receive access to current and archived content from News Corp’s major news and information publications, including The Wall Street Journal, Barron’s, MarketWatch, Investor’s Business Daily, FN, and New York Post; The Times, The Sunday Times and The Sun; The Australian, news.com.au, The Daily Telegraph, The Courier Mail, The Advertiser, and Herald Sun; and others…
In addition to providing content, News Corp will share journalistic expertise to help ensure the highest journalism standards are present across OpenAI’s offering.

There was no explicit mention of NewsCorp receiving compensation from OpenAI or plans to use OpenAI tools internally. However, those were elements of earlier deals and are likely included in the “multi-year agreement.”

Vox Media

Vox Media positioned its OpenAI agreement as “strategic.” In addition to providing access to content, Vox says it plans to collaborate with OpenAI to launch “innovative products” and integrate the technology with the Forte data platform.

Vox Media’s respected portfolio of properties including Vox, The Verge, Eater, New York Magazine, The Cut, Vulture, and SB Nation, will help inform ChatGPT’s 100 million users, receiving brand attribution and audience referrals. The two companies will also collaborate using OpenAI’s technology to develop innovative products for Vox Media’s consumers and advertising partners…
Additionally, OpenAI will enhance its technology with Vox Media’s archives, which contain a deep well of reliable and accountable information and journalism…Vox Media will also use OpenAI technology to extend the leadership of Forte, its first-party data platform…Advertisers will benefit from the OpenAI partnership through even stronger creative optimization and audience segment targeting, leading to even higher campaign performance.

The Atlantic

The Atlantic also positioned the OpenAI agreement as “strategic.” The publisher intends to launch a new microsite that showcases its AI products, including collaboration with OpenAI.

The Atlantic’s articles will be discoverable within OpenAI’s products, including ChatGPT, and as a partner, The Atlantic will help to shape how news is surfaced and presented in future real-time discovery products. Queries that surface The Atlantic will include attribution and a link to read the full article on theatlantic.com(opens in a new window).
As part of this agreement, The Atlantic and OpenAI are also collaborating on product and tech: The Atlantic’s product team will have privileged access to OpenAI tech, give feedback, and share use-cases to shape and improve future news experiences in ChatGPT and other OpenAI products. The Atlantic is currently developing an experimental microsite, called Atlantic Labs, to figure out how AI can help in the development of new products and features to better serve its journalism and readers––and will pilot OpenAI’s and other emerging tech into this work.

What it Means

OpenAI has three objectives driving these deals. First, it needs high-quality data to train its AI foundation models. The race among AI model developers is transitioning from competition around accumulating more data to the quest for higher-quality data. Data quality has been a key driver in recent benchmark performance improvements of several AI foundation models.

Second, OpenAI would like to provide a better user experience for ChatGPT subscribers. The company believes it can grow the current 100 million weekly active users and increase usage frequency by adding more timely news coverage with attribution links. This puts OpenAI on a collision course with Google and the company’s key patron, Microsoft.

Finally, each licensing deal ensures that OpenAI will not be sued for its previous use of content from the publishers in training its earlier models. Though the company believes it had the right to use that public data to train its models, these agreements serve the purpose of reducing the risk of future lawsuits.

This is also more bad news for The New York Times. Instead of cutting a deal with OpenAI, the publisher filed a lawsuit. News publishers’ content varies in quality, and some have niche reporting that may be unique. However, publishers largely deliver similar news. In addition, OpenAI is primarily interested in high-quality content to train its models. Given the breadth of deals that OpenAI has already cut, the value of a deal with The New York Times has surely diminished. OpenAI may still attempt to strike an agreement, but its incentives to bend to the publisher’s demands lessen with every new publisher deal it inks.

OpenAI and Microsoft Sued by New York Times for Copyright Infringement

Bret Kinsella

December 28, 2023

OpenAI and Microsoft Sued by New York Times for Copyright Infringement

That didn’t take long. Yesterday, we wrote about Anthropic’s new indemnity protection for users. That article concluded with the line: 2024 will be a year when generative AI goes to court. Earlier this morning, the New York Times filed a lawsuit naming OpenAI and Microsoft over the unauthorized use of copyrighted material to train large language models (L…

Read full story

Google Gemini 1.5 and Flash LLMs Show Significant Advances Hidden in Research

Bret Kinsella

June 2, 2024

Google Gemini 1.5 and Flash LLMs Show Significant Advances Hidden in Research

Google is developing a habit of releasing research papers and then simply adding information to the versions hosted in Deepmind servers without versioning. This makes it harder to know when the information was added and what was added. During Google I/O, the company showed off the latest Gemini 1.5 Pro and Flash large language models (LLM) but didn’t me…

Read full story

Jurgen Gravestein

Jun 3, 2024

I wonder if OpenAI (apart from collecting more training data) is preparing, given all these content deals, some kind of rich news-reading experience inside ChatGPT.

I could imagine users 'enabling' their favorite news sources inside the ChatGPT interface and then, if one asks, "Hey ChatGPT, what's new today" providing users with a curated generative news experience.

Expand full comment

1 reply by Bret Kinsella

1 more comment...