LLMs are not inventions, they are Discoveries. How That Fact Changes Adoption and Outlook.
Reframing how you think about generative AI
In his recent interview on the Lex Fridman podcast, Jeff Bezos said something that may help people understand what is happening with large language models (LLM). It is the type of insight that often comes from someone who is not on the front lines but has the perspective of distance coupled with the knowledge of an insider and intimate familiarity with innovation. Bezos said:
It's interesting to me that that large language models in their current form are not inventions, they're discoveries. You know, the telescope was an invention, but looking through it at Jupiter, knowing that it had moons was a discovery. “My God, it has moons.” And that's what Galileo did…
We know exactly what happens with a 787. It's an engineered object. We designed it, we know how it behaves. We don't want any surprises.
Large language models are much more like discoveries. We're constantly getting surprised by their capabilities. They're not really engineered objects.
Inventions vs. Discoveries vs. Capabilities
Bezos uses an analogy of a tool that helped discover things that already existed. Jupiter existed before the telescope. We needed the device to know it was there. In many ways, the telescope was invented to discover. A Boeing 787 was invented (or engineered) to transport people and goods over long distances. It was not invented to discover knowledge. The aspirations behind its development were more down-to-earth.
LLMs were not invented to solve a particular problem. They were invented, and we are discovering what they can do.
This is not to say people didn’t have objectives in mind when developing LLMs. OpenAI clearly stated its intention to create artificial general intelligence (AGI). However, there was no single definition of how AGI or an LLM would provide value. An LLM’s capability to write a poem, analyze a spreadsheet, or summarize a long document is not a discovery similar to identifying Jupiter for the first time. LLM capabilities were not waiting for someone to come along and notice them. Nor are the capabilities emergent. The invention enables new capabilities that did not exist in previous machines.
Setting aside the semantics of the analogy, the spirit of Bezos’ idea still holds. It is the difference between what something is created to do and what it can do. An invention may be created for one thing but then applied to others. There are many examples of this. The steam engine. Gore-Tex. Adhesive microspheres. The internet. These analogies are better than the telescope for understanding LLMs and the impact they will have.
Consider the Internet. Gmail, Waze, WhatsApp, Netflix, YouTube, Uber, AirBnB, Amazon, Chess.com, and other products and services were enabled by the Internet and the World Wide Web. They were not the intended outcome.
For example, someone figured out that LLMs could probably write software code given its training data. They could. When more training data related to coding was added and the models fine-tuned for software development, the output was even better.
The Art of the Possible
It didn’t occur to many people that ChatGPT, Midjourney, Pika Labs, or GitHub Copilot would exist anytime soon. LLMs, diffusion models, and GANs were invented, and these solutions were engineered. And these solutions are having a big impact. Many other solutions are in their infancy that will also have a big impact.
New products often evolve through stages of research, development, and commercialization. There is still core research underway, but the work now is applied engineering. We are in the era of discovering new applications and refining them for commercial use. LLMs are remarkable because they have been so fruitful in enabling new capabilities. Bezos was right to reframe the conversation so people can better understand how to think about generative AI. Discover what the technology enables, and then turn it into a useful solution.
That is precisely what Bezos did when founding and building Amazon as an online bookstore and then an “everything store.” He was discovering what the web enabled. He did it again with AWS and discovered the value cloud computing could deliver.
This is a key reason why people have been surprised by the rapid adoption of LLMs despite the incidence of hallucinations. Sometimes, new capabilities are so impactful that the benefits outweigh the issues. Explorers understand that discovery often comes with tradeoffs. They also understand that value must be found.
LLMs are Here to Save Us
Bezos also had an interesting comment on safety, which was overlooked by many people. While we need to consider risks, we should also consider how the profound benefits the technology could introduce.
Then you know, you have this debate about whether they're gonna be good for humanity or bad for humanity. You know, even specialized AI could be very bad for humanity. I mean, just regular machine learning models that can make you know, certain weapons of war that could be incredibly destructive are very powerful. And they're not general AIs, they're just, they could just be very smart weapons. And so we have to think about all of those things.
I'm very optimistic about this. So even in the face of all this uncertainty, my own view is that these powerful tools are much more likely to help us and save us even than they are to unbalance, hurt us, and destroy us. I think you know, we humans have a lot of ways of, we can make ourselves go extinct. You know, [LLMs] may help us not do that…They may actually save us.
My friend, Greg Cross from Soul Machines, said in mid-2022, “The robots are coming…and they are just in time.”
Entering 2024, you may want to think about what generative AI capabilities could accomplish. Then, work backward to your engineered solution.
I feel like the LLM is the invention, but next-word prediction, or the fundamental concept beneath it is the discovery.
Listen to Ilya Sutsksver and you’ll notice that he is fascinated by the fact that something that looks a lot like intelligence appears just from training a model to predict its own inputs--no special objective function required.
This same principle may explain the mechanism for learning in the human brain too (see Karl Friston).