Do Large Language Models Have Emergent Abilities?
Three Stanford researchers challenge a popular idea.
“Emergence is when quantitative changes in a system result in qualitative changes in behavior.” Philip Anderson, 1972
“An ability is emergent if it is not present in smaller models but is present in larger models.” Wei, Tay, Bommasani, Raffel, Zoph, Borgeaud, Yogatama, et al., 2022
“[The presence of] emergent abilities may be creations of the researcher’s choices, not a fundamental property of the [AI] model family on the specific task.” Schaeffer, Miranda, Koyejo, 2023
Can new abilities spring forth from large language models as they scale? The Human-Centered Artificial Intelligence Center (HAI) summarizes a new research paper that says AI’s Ostensible Emergent Abilities Are a Mirage.
“With bigger models, you get better performance, but we don’t have evidence to suggest that the whole is greater than the sum of its parts,” said Stanford second-year graduate researcher Rylan Schaeffer in an interview with HAI.
The report responds to a popular thesis that large language models (LLM) sometimes exhibit capabilities that were not part of their training and are not present in smaller models. A research paper by Google and Deepmind researchers from October 2022 stated:
Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence raises the question of whether additional scaling could potentially further expand the range of capabilities of language models
Stanford’s Schaeffer, Miranda, and Koyejo contend in a new paper that this thesis only appears to be confirmed due to measurement bias.
Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, one can choose a metric which leads to the inference of an emergent ability or another metric which does not. Thus, our alternative suggests that existing claims of emergent abilities are creations of the researcher's analyses, not fundamental changes in model behavior on specific tasks with scale.
What is This All About
This argument is not about hallucinations. It is about whether LLMs actually gain discontinuous abilities that are unpredictable. The most often cited example is math. However, this could be any new model feature, such as reasoning abilities, game playing, or mapping a knowledge domain.
A few researchers suggest that these abilities seemingly spring from nowhere once a model reaches a certain size. This has fueled some people’s wishful predictions that increasing size means a disproportionate increase in unpredictable potential capabilities. In other words, the value of an LLM may scale faster than the inputs and there is no way to tell what will emerge or when.
Why This Matters
The observation that previous research conclusions were skewed based on the choice of metrics is a useful check on existing practices. That is how the scientific process works. Researchers build on each other’s work in some cases and help subsequent studies avoid heading down a dead end.
However, the more important factor here relates to the quest for artificial general intelligence (AGI) and fears of what it may bring. The famous Musk letter from the Future of Life Institute that called for a six-month moratorium on advanced AI development cited the “emergent abilities research” as evidence of a looming threat. The letter stated in part:
“Unfortunately, this level of planning and management is not happening, even though recent months have seen AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control.
…
“We call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4 … stepping back from the dangerous race to ever-larger unpredictable black-box models with emergent capabilities.”
The latest Stanford research calls the premise behind these fears into question. If the models are not unpredictable and do not express emergent abilities, then the fear of sentient machines and the rise of robot overlords may be overstated.
HAI’s interview concludes with the observation, “We don’t need to worry about accidentally stumbling onto artificial general intelligence (AGI). Yes, AGI may still have huge consequences for human society, Schaeffer says, ‘but if it emerges, we should be able to see it coming.’”
A Comment on AI Fears
You may have noticed that I am skeptical of AGI alarmism. I do not think there is a zero probability of achieving AGI nor do I think very bad outcomes are impossible. Given the potential for a very bad outcome, I don’t fault people for wanting to monitor the situation. However, AGI or AI sentience seems exceedingly unlikely in the near term.
This latest paper suggests LLM performance and abilities may be more transparent and predictable than some researchers have suggested. Maybe we don’t need to take a step back from advanced AI research. Maybe we need to step back from alarmism.
I recognize that many people strongly believe it is inevitable a superintelligent AI will emerge and it could quickly evolve beyond human control. If you have that belief, make your case in the comments. I’d be happy to have the discussion. For now, I just don’t see any evidence that the fantastical claims are probable. Maybe the evidence is there, but the “emergent abilities” thesis looks much weaker in light of the new research.