Sam Altman is Not Raising $7T, Microsoft is Unlikely to Build a $100B Supercomputer, and the Spectre of Dark GPUs
A trend worth watching
Rumors about AI often seem to be the human version of those large language model (LLM) hallucinations that so many people fret about. Consider the headline, “GPT-4 will have 100 trillion parameters.” That appeared in several well-regarded technology blogs. How about the Wall Street Journal article titled, “Sam Altman Seeks Trillions of Dollars to Reshape Business of Chips and AI.” WSJ doubled down on that with another article, “Sam Altman’s $7 Trillion Moonshot.”
The key problem with these rumors that appeared in so many articles is they were untrue and not in the ballpark of truth. GPT-4 has about 1.8 trillion parameters, as NVIDIA’s Jensen Huang conveniently confirmed for everyone at GTC 2024. That is 1/50th of the claims, which likely originated in a talk by Lex Fridman, considering this hypothetical idea. Sam Altman personally debunked the 100 trillion figure in a StrictlyVC podcast and the $7 trillion rumor on the Lex Fridman podcast, saying, “I never said we’re raising $7 trillion, or blah, blah, blah.”
I am not pointing this out to dunk on Towards Data Science or the Wall Street Journal. Synthedia wrote last year that “Stability AI is Raising Funds at a $4 Billion Valuation.” A more accurate statement would have been “…Seeking a $4 Billion Valuation.” Granted, the company did raise at a $1 billion valuation the previous year, and Midjourney was clearly valued in the billions of dollars, so this was not a stretch. The point is that rumors abound. Some are true, while some are not. However, some rumors appear outrageous even by AI hype-cycle standards. That’s a signal in the noise.
Getting Attention
The problem is that reality is already outrageous in its own way. Three years ago, we didn’t have solutions that could write a short essay on Roman etiquette in the style of Ernest Hemingway in less than 10 seconds. Nor did we have 1.8 trillion parameter large language models or 30 trillion data token open-source training datasets. These all exist today. Startling breakthroughs and enormously sized models and datasets make it harder for anyone to surprise readers So, they pick a much larger number to get your attention. Caveat emptor.
$100 Billion Data Center?
The latest rumor in this genre is the $100 billion data center (or supercomputer, depending on who is telling the story) planned by Microsoft and OpenAI. A very expensive supercomputer is hundreds of millions of dollars. A very expensive data center today costs $1 billion. Why would it cost 100x more than the most advanced new data centers, which are also designed to process AI workloads?
Even if this was necessary, why would it be a single data center and not three or five geographically dispersed data centers networked together? Could this possibly be true? Yes. Is it likely true, as stated? No.
The story here is that OpenAI is an adherent of compute scaling as the path to AGI. Based on where the company believes it is in the AGI innovation journey, it expects to need a lot more computing power to reach the objective. Nobody knows whether AGI is possible or how much computing power would be required. Regardless, the $100 billion figure is extremely unlikely. It’s as if the writers behind these stories are willfully ignoring both common sense and the practicality and risk of such an investment approach.
Dark GPUs on the Horizon
An idea you are likely to start hearing more about that is closer to reality is the risk of a glut of AI chips in the market. The dot-com bubble drove a lot of optimism around how quickly web usage would rise and the internet capacity requirements that would drive it. Telcom companies and others began building out fiber optic networks to carry all of the anticipated digital transport layer demand. Then the dot-com bust arrived. For years, much of the fiber optic data transport capacity went unused and was not “lit up,” which led to the term “dark fiber.”
We are likely to see a similar phenomenon transpire in the AI chip market, which is currently largely driven by advanced chips based on GPUs. Dark GPUs have already been shown in some areas. Demand for NVIDIA GPU-based AI chips outstripped supply for much of 2023. That led to a buying frenzy based on overly optimistic forecasts of AI compute capacity needs. There have already been some market imbalances in terms of excess capacity (i.e., Stability AI, Imbue) and supply shortages (e.g., the big cloud providers, particularly Microsoft Azure, along with AI technology developers) in different segments.
Market imbalances typically lead to excess ordering as companies attempt to avoid a shortage that undermines their ability to meet their business objectives. Those behaviors are likely to lead to an AI chip oversupply in 2024. This will be a bad outcome for the companies that over-purchased very expensive GPUs but it could be good news for companies looking to consume AI computing power for foundation model training and inference workloads. Dark GPUs will reduce prices as infrastructure owners look to recapture some revenue from their investments.
But what is AI Data center demand going to do to our energy infrastructure?
Beautiful writing as always ;-)