Nvidia is Becoming the Giant of Generative AI
New GPUs, text-to-image models, an LLM service to compete with OpenAI, and more
Nvidia is making dozens of announcements today, and many are directly related to hardware, software, and services for generative AI. A January post by Andreessen Horowitz stated:
Behind the scenes, running the vast majority of AI workloads, is perhaps the biggest winner in generative AI so far: Nvidia … They’ve built strong moats around this business via decades of investment in the GPU architecture, a robust software ecosystem, and deep usage in the academic community. One recent analysis found that Nvidia GPUs are cited in research papers 90 times more than the top AI chip startups combined.
Nvidia has also invested heavily in the Omniverse platform, designed to help developers seamlessly integrate AI workloads across a variety of software tools. Today’s announcements deepen Nvidia’s current product features for running AI models and supporting AI-related development. They also move Nvidia directly into new areas of the generative AI value chain, where it will compete directly with the current industry leaders.
Picasso
The new Nvidia Picasso service brings Nvidia into direct competition with OpenAI’s DALL-E and Stability AI’s Stable Diffusion. It also offers new services to companies that are using OpenAI and Stable Diffusion technology. Picasso represents three new text-to-image / text-to-visual offerings:
Hosting your generative AI foundation model - Hosting and optimizing your existing text-to-image model for training and inference
Creating your generative AI model - Develop your own text-to-visual foundation model based on Nvidia technology
Customizing a generative AI model from a third party - Fine-tuning and hosting an existing text-to-image model sourced from a third-party
Examples of companies taking advantage of the first service include Wombo and Runway. They already have generative AI models and now use the new Nvidia service to optimize performance.
“The second workflow would be you have lots of data but don’t know how to train a huge generative AI foundation model that’s scaling to thousands of GPUs. So, bring your data. We can help you train your foundation model, and then we host it for you … as a business-to-business API call into your applications,” said Kari Ann Briski, vice president of software product management for AI at Nvidi. Examples of current customers for this service are Shutterstock and Getty.
Regarding the third new service, Briski added, “If you are a business and don’t have a lot of data, and you can’t train your own foundation model, but you are able to use these foundation models from a partner [with the Picasso service] that you can fine-tune and customize, and we host it for you as well.” Examples of third-party text-to-image models that Nvidia customers may use include DALL-E and Stable Diffusion.
Services 1 and 3 from the list above relieve in-house teams from building expertise in hosting, training, and operating generative AI models. In that way, the new Nvidia Picasso offerings compete with various AI infrastructure hosting services and Microsoft Azure’s new OpenAI Services.
The second service listed above represents direct competition with Stable Diffusion and a substitute for using someone else’s model. For example, while Shutterstock is currently working with Nvidia on a new 3D generative image model, you could imagine it substituting the existing text-to-image service driven by DALL-E for a new foundation model created in collaboration with Nvidia.
It is worth noting that creating foundation models with Picasso requires the use of Nvidia services teams to run the program.
New Edify Text-to-Visual Models
Picasso is the service, and Edify is the name of the new collection of text-to-visual models that Nvidia is promoting. These models include:
Text-to-image
Text-to-video
Text-to-3D
Getty is using the Edify text-to-image model as part of the Picasso foundation model creation service. Shutterstock is using the Edify text-to-3D model to create new 3D assets from its image catalog. While the text-to-image segment is maturing quickly and already has a lot of market momentum, text-to-video, and text-to-3D are segments still in their infancy. This may be a market where Nvidia could take the lead as opposed to playing catch-up with existing services and models.
Another noteworthy point is that you will hear Nvidia and its partners use the term “responsible content attribution.” This speaks to the controversial topic of attribution to artists that created images used for AI model training. Nvidia is in the early stages of developing new models with Adobe that support sourcing contributions by artists. However, this does not currently contemplate directly sourcing an image output to an image used in training.
The Getty approach is the most likely model for these programs currently in development. Getty is only using licensed content for training its foundation model. While the company will not be able to match a generated image directly with source images to indicate provenance, it will know the entire set of images that could have influenced the production.
NeMo (aka Megatron!)
Nvidia’s announcement about its NeMo text-to-text LLM service summarizes the solution as:
The NeMo cloud service that enables developers to make large language models (LLMs) more relevant for businesses by defining areas of focus, adding domain-specific knowledge and teaching functional skills … Models of varying sizes — from 8 billion to 530 billion parameters — available on the service are regularly updated with additional training data.
Nvidia’s Briski told me that the NeMo SDK is identical to the open source models that the company has previously published. Though Nvidia may contest the characterization that the NeMo service is competitive with OpenAI or other generative AI services, it is clearly a direct substitute.
Granted, Briski added that Nvidia is only focused on business-to-business applications, and “Most of our customers that were are working with want the flexibility to go on-prem. So, the ability to do a flexible deployment and run anywhere is the niche we are trying to hit.” An on-premises deployment is not an option today for OpenAI’s models.
Briski also suggested that users can already access the foundational open source NeMo Megatron models today, so this isn’t exactly an entirely new product. The announcement is around the service packaging that Nvidia has created to provide additional value for developers.
NeMo is offering three model sizes today, known as small, medium, and large. These equate to 8 billion, 43 billion, and 530 billion parameter models. All were trained on over 1 trillion tokens—a far larger data set than GPT-3—and the large model has undergone additional fine-tuning.
The standard context window for these models is 2k data tokens with an option of 4k. The models have been tested with 8k context windows, and a new feature will enable larger windows after additional testing is completed. This feature is less than GPT-4’s 32k maximum data token context window and half of GPT-3.5 and the GPT-4 standard model of 8k.
Many people are unaware that the API for GPT-4, with the largest context window of 32k, is double the cost of using the 8k window and 3-6 times higher than the GPT-3.5 Davinci model. NeMo pricing is not currently available.
The Plot (and Competition) Thickens
There is a lot of focus on the battle between OpenAI + Microsft and Google. And there is also interest in the independent LLMs and text-to-image models and how they will compete with the tech giants. Nvidia just indirectly announced it is competing directly with all of these core technology providers and offering a packaged alternative for developers looking to create and operate generative AI applications.