NVIDIA Drives Higher GPU Performance and Sets New Standard in MLPerf Benchmark
The industry-leading GPUs for generative AI training, are getting better
NVIDIA’s H100 GPUs are the gold standard of generative AI workloads, and the company just released new data suggesting the gap with competitors may be getting wider. While the H100s are in short supply and the industry waits for the even higher performance from the forthcoming GH200 GPUs, NVIDIA has announced new software that can double the chip’s performance.
The H100 GPU outperformed the earlier generation A100 four times in the recent GPT-J 6B Inference Performance test and was 2.6 times better in the Llama 2 Inference test. However, using the H100 with the new Tensor RT-LLM software can double its performance to deliver an 8x edge over the A100. According to the announcement:
NVIDIA HGX H100 systems that pack eight H100 GPUs delivered the highest throughput on every MLPerf Inference test in this round.
Grace Hopper Superchips and H100 GPUs led across all MLPerf’s data center tests, including inference for computer vision, speech recognition and medical imaging, in addition to the more demanding use cases of recommendation systems and the large language models (LLMs) used in generative AI.
…
NVIDIA developed TensorRT-LLM, generative AI software that optimizes inference. The open-source library — which was not ready in time for August submission to MLPerf — enables customers to more than double the inference performance of their already purchased H100 GPUs at no added cost.
NVIDIA said in a media briefing that it strives to double the performance of every GPU before its end of life. The performance gains are typically driven by software and architecture changes.
Strong Performance Over Competitors
NVIDIA benchmarking of competitor products shows about a 25% performance advantage over Intel-HabanaLabs and 10x over Google TPU for the GPT-J 6B 99% test and double Intel for the 99.9% test. Other chip manufacturers have not submitted for many of the tests, and NVIDIA claims to be the only company to provide data for all of them.
Alleviating GPU Scarcity
Two key takeaways both relate to the ongoing NVIDIA GPU scarcity situation. First, the new software improvements will help alleviate the GPU scarcity by enabling more throughput for existing chips already in use. The software is available free of charge for open source.
Second, the new GH200 chip is expected to have at least 17% greater performance than the H100 and will generate demand from companies with extensive need for generative AI training and inference jobs. That will likely lead to an even larger revenue bump for NVIDIA in the coming months.
Google, Amazon, and others have chips for running AI workloads and are investing in new, higher-performance products. However, today, NVIDIA is dominating in terms of performance and user preferences. It looks like NVIDIA is also making it harder for competitors to catch up.
Falcon 180B Shows That Open Models Can Rival Performance of Leading Proprietary LLMs
The Technology Innovation Institute (TII) has released the Falcon 180B large language model (LLM), a successor to its 40B foundation model released earlier this year, and it debuted atop the Open LLM Leaderboard for “open models.” Notably, tests show Falcon 180B exhibits similar performance as Google’s PaLM Large LLM.





