Rumors?? Google Enlists DeepMind and Uses ChatGPT Data to Catch Up with OpenAI
Google's deficit may be worse than expected
Google’s near absence from the ongoing generative AI competition has once again taken center stage with new rumors about its strategy to catch up. In the absence of verifiable information, rumors inevitably fill a vacuum. Then again, rumors are sometimes proven to be true.
Using ChatGPT Data to Train Bard?
The Information reported yesterday that Jacob Devlin, a Google AI engineer, resigned in January after informing company leadership that ChatGPT data was being used to train Bard. According to the account:
Devlin quit after sharing concerns with Pichai, Dean and other senior managers that the Bard team, which received assistance from Brain employees, was training its machine-learning model using data from OpenAI’s ChatGPT. Specifically, Devlin believed the Bard team appeared to be relying heavily on information from ShareGPT, a website where people publish conversations they’ve had with ChatGPT.
Some Google employees felt the use of such chat logs would have violated OpenAI’s terms of service, which prohibit the use of “output…to develop models that compete with OpenAI,” according to its website. Devlin also told executives he was concerned that by relying on the ChatGPT chat logs from ShareGPT, Bard’s answers would mimic or resemble those of ChatGPT too much.
It is unclear whether the use of ChatGPT output republished on a third-party site violates any terms of service. How would this differ from scraping the same information from Twitter or an article in The Wall Street Journal? Large language models (LLM) are trained largely on internet data. Did any LaMDA-generated data ever wind up in OpenAI’s data set?
I can understand the concern about using training data that might cause Bard to mimic ChatGPT. You might consider this an unethical type of reverse engineering or simply want to develop a differentiated and more useful product. At the same time, it seems reasonable that people could hold different opinions on this matter.
This strikes me as less incendiary than many of the headlines today are making it out to be. But, again, I’m sure logical minds can differ on this point.
DeepMind to the Rescue?
The Information also reports the open secret that DeepMind and the Google Brain units have competed far more than they have collaborated since the former was acquired in 2014. However, OpenAI’s momentum has changed this dynamic and led to the creation of the Gemini project, a “joint effort began in recent weeks.” According to the story:
The two have been competing to improve some of Google’s products and services and to gain global notoriety for research breakthroughs. Now, though, employees at both of Alphabet’s AI labs agree that OpenAI has outflanked them…
Gemini is a forced marriage. Alphabet’s two AI labs have seldom collaborated or shared computer code with one another. But now, because both wanted to develop their own machine-learning model to compete with OpenAI and needed an inordinate amount of computing power to do so, they had little choice but to work together, said a person with knowledge of the situation.
The Enemy of My Enemy is My Friend?
DeepMind’s leaders were very protective of their independence and negotiated a great deal of autonomy related to the acquisition. From a practical standpoint, Google has seldom forced DeepMind to follow its orders due to a fear that the top talent in the organization would leave the company as a result. But crises and perceived crises change attitudes and assumptions.
The DeepMind, Google Brain, and broader Google AI teams do not want to be second to OpenAI for a technology they have devoted their careers to making engineering breakthroughs. That makes collaboration thinkable and predictable. A common enemy can band rivals together.
However, that doesn’t mean the marriage will be happy. You still have large egos at play, different perspectives, knowledge, and experiences. It will take them some time to pool their knowledge, align their objectives, and learn how to collaborate efficiently.
This may be a necessary step for Google, but there is a probable scenario where the collaboration slows the effort to catch up to OpenAI. It may increase the likelihood of eventual success, but it seems optimistic that the move will shorten the process. In addition, the move suggests Google has just recently realized the extent of its gap to OpenAI and decided to, once again, change its approach.
Is the Model Enough?
Most of this intrigue centers on the performance of the LaMDA or PaLM LLM models. That is an important gap. However, OpenAI’s lead extends beyond model performance. ChatGPT was an alignment success. The technology behind the model was effectively aligned with the use case through embedding, fine-tuning, user interface, and user interaction model.
OpenAI introduced GPT-3 to the market in 2020. It has been working on alignment for nearly three years directly with third-party developers and based on its own research. The technology improvements that led to InstructGPT, ChatGPT, and GPT-4 moved in parallel.
Google may be further behind in alignment than in the foundation model performance. And, if you are using reinforcement learning with human feedback (RLHF), you have humans in the loop, which means it is hard to compress the time it takes to make advances.
Finally, OpenAI has built a large ecosystem of development partners using its API. That effort just went into overdrive with the launch of ChatGPT plugins. OpenAI has a lead in technology, alignment, partner ecosystem, mindshare, and market share. And its lead seems to be growing in each area. Google is a legendary fast follower but has a big challenge ahead.