Apple is Courting News Publishers Hoping to Use Their Content to Train a New LLM
The iPhone maker is offering $50M deals for use of the publishers' content libraries
The New York Times reported that Apple has reached out to several leading news publishers asking to use their content for training a new large language model (LLM). According to the report:
The technology giant has floated multiyear deals worth at least $50 million to license the archives of news articles, said the people with knowledge of talks, who spoke on the condition of anonymity to discuss sensitive negotiations. The news organizations contacted by Apple include Condé Nast, publisher of Vogue and The New Yorker; NBC News; and IAC, which owns People, The Daily Beast and Better Homes and Gardens.
Playing Catch-up
Apple faces two problems. Not only did it miss the significance of the generative AI revolution sparked by Google’s research in 2017-18, refined by OpenAI in 2020, and accelerated by the introduction of ChatGPT, but it also doesn’t have easy access to data.
Apple famously focuses its corporate positioning on user privacy and derides rivals whose business models are based on harvesting value from user data. It could go out and scrape the internet like everyone else or use open-source datasets such as Together AI’s Redpajama 30T. However, that will not differentiate it from peers.
We have seen that more data and better-curated datasets have led to many of the recent performance milestones. Google and Meta both have substantial proprietary datasets from their search social media businesses. OpenAI has been working on this problem for years and is becoming a proprietary data giant through ChatGPT. It also has recently cut deals with media organizations AP and Axel Springer. Apple is starting from scratch.
Apple may also want high-quality data to use with an upgraded Siri. If the company ever wants the voice assistant pioneer to reach parity with ChatGPT, it will need more than a search partner.
The publisher outreach is a smart move on Apple’s part, showing the company is now taking action after a long period of denial. If the company couples high-quality curated datasets with an acquisition of a tier-2 LLM, it might help close the gap. That said, it is hard to see Apple achieving any advantage in generative AI. The company more likely will be fighting for sufficiency.
Apple to the Rescue
Apple is likely to secure publisher deals for two reasons. First, it is no secret that Apple has over $160 billion of cash on its balance sheet. The company can buy access to content. It is also not a secret that many news and lifestyle publishers are struggling financially.
Taking cash from Apple will be the easiest way for publishers to harvest immediate value from their back catalog of content. And, even though Apple will surely ask for favorable terms, this is not Steve Jobs’ Apple that completely upended the business models of book and music publishers. Apple is a laggard with a big bank account, looking to partner with legacy businesses that continue to struggle with digital transformation and appear to be completely unprepared for AI transformation.
Despite this situation, some publishers are wary, according to The New York Times:
Some of the publishers contacted by Apple were lukewarm on the overture. After years of on-again-off-again commercial deals with tech companies like Meta, the owner of Facebook, publishers have grown wary of jumping into business with Silicon Valley.
Several publishing executives were concerned that Apple’s terms were too expansive, according to three people familiar with the negotiations. The initial pitch covered broad licensing of publishers’ archives of published content, with publishers potentially on the hook for any legal liabilities that could stem from Apple’s use of their content.
…
Still, some news executives were optimistic that Apple’s approach might eventually lead to a meaningful partnership. Two people familiar with the discussions struck a positive note on the long-term prospects of a deal, contrasting Apple’s approach of asking for permission with behavior from other artificial intelligence-enabled companies, which have been accused of seeking licensing deals with news organizations after they had already used their content to train generative models.
Most of these deals will happen. Money talks and it will be just in time for some publishers. Even news publishers with strong businesses, like Axel Springer, are making deals related to generative AI. That deal was with OpenAI, and the press release mentioned:
This marks a significant step in both companies’ commitment to leverage AI for enhancing content experiences and creating new financial opportunities that support a sustainable future for journalism.
Apple’s Assets
Apple is a bit of an outsider among the tech giants regarding generative AI. It is unlikely to offer a business-oriented solution like an API accessing an LLM. Instead, it will focus on integrating generative AI into its line-up of consumer and business products ranging from iPhones to Mac laptops.
The company has three key assets to leverage when it comes to generative AI:
A lot of cash to buy its way into the generative AI race
A large loyal user base, particularly for iPhone, but also for its broader product portfolio
Trust
Apple is an afterthought when it comes to the generative AI competitive landscape. This is because the company is behind in AI, and the technology will exist to support existing products, similar to how Siri was added as a feature of the iPhone. However, Apple wants to have some control over its future use of the technology and how the solutions operate. The company will not sell generative AI services but will bake generative AI features into products.
It is logical to start with the data to maintain the trust Apple has built with users and support its commitment to privacy. Publishers are only a first stop on the data licensing path. Expect Apple to obtain data from many more sources in the coming year and soon claim it has a differentiated dataset powering its products’ generative AI features. How good the solutions are and how soon may be another matter.