Synthedia

Share this post

Nvidia is Becoming the Giant of Generative AI

synthedia.substack.com

Discover more from Synthedia

Join over 2,000 of the most erudite tech observers and follow the business and technology of Generative AI, Synthetic Media, LLMs, ChatGPT, DALL-E, Stable Diffusion, PaLM/Gemini, virtual humans/digital people, voice clones, deepfakes, and much more.
Over 2,000 subscribers
Continue reading
Sign in

Nvidia is Becoming the Giant of Generative AI

New GPUs, text-to-image models, an LLM service to compete with OpenAI, and more

Bret Kinsella
Mar 21, 2023
7
Share this post

Nvidia is Becoming the Giant of Generative AI

synthedia.substack.com
Share
Image source: Nvidia

Nvidia is making dozens of announcements today, and many are directly related to hardware, software, and services for generative AI. A January post by Andreessen Horowitz stated:

Behind the scenes, running the vast majority of AI workloads, is perhaps the biggest winner in generative AI so far: Nvidia … They’ve built strong moats around this business via decades of investment in the GPU architecture, a robust software ecosystem, and deep usage in the academic community. One recent analysis found that Nvidia GPUs are cited in research papers 90 times more than the top AI chip startups combined.

Nvidia has also invested heavily in the Omniverse platform, designed to help developers seamlessly integrate AI workloads across a variety of software tools. Today’s announcements deepen Nvidia’s current product features for running AI models and supporting AI-related development. They also move Nvidia directly into new areas of the generative AI value chain, where it will compete directly with the current industry leaders.

Picasso

The new Nvidia Picasso service brings Nvidia into direct competition with OpenAI’s DALL-E and Stability AI’s Stable Diffusion. It also offers new services to companies that are using OpenAI and Stable Diffusion technology. Picasso represents three new text-to-image / text-to-visual offerings:

  1. Hosting your generative AI foundation model - Hosting and optimizing your existing text-to-image model for training and inference

  2. Creating your generative AI model - Develop your own text-to-visual foundation model based on Nvidia technology

  3. Customizing a generative AI model from a third party - Fine-tuning and hosting an existing text-to-image model sourced from a third-party

Examples of companies taking advantage of the first service include Wombo and Runway. They already have generative AI models and now use the new Nvidia service to optimize performance.

“The second workflow would be you have lots of data but don’t know how to train a huge generative AI foundation model that’s scaling to thousands of GPUs. So, bring your data. We can help you train your foundation model, and then we host it for you … as a business-to-business API call into your applications,” said Kari Ann Briski, vice president of software product management for AI at Nvidi. Examples of current customers for this service are Shutterstock and Getty.

Regarding the third new service, Briski added, “If you are a business and don’t have a lot of data, and you can’t train your own foundation model, but you are able to use these foundation models from a partner [with the Picasso service] that you can fine-tune and customize, and we host it for you as well.” Examples of third-party text-to-image models that Nvidia customers may use include DALL-E and Stable Diffusion.

Subscribe to Synthedia for free! Get a daily breakdown of the top generative AI and synthetic media stories every day.

Services 1 and 3 from the list above relieve in-house teams from building expertise in hosting, training, and operating generative AI models. In that way, the new Nvidia Picasso offerings compete with various AI infrastructure hosting services and Microsoft Azure’s new OpenAI Services.

The second service listed above represents direct competition with Stable Diffusion and a substitute for using someone else’s model. For example, while Shutterstock is currently working with Nvidia on a new 3D generative image model, you could imagine it substituting the existing text-to-image service driven by DALL-E for a new foundation model created in collaboration with Nvidia.

It is worth noting that creating foundation models with Picasso requires the use of Nvidia services teams to run the program.

New Edify Text-to-Visual Models

Picasso is the service, and Edify is the name of the new collection of text-to-visual models that Nvidia is promoting. These models include:

  • Text-to-image

  • Text-to-video

  • Text-to-3D

Getty is using the Edify text-to-image model as part of the Picasso foundation model creation service. Shutterstock is using the Edify text-to-3D model to create new 3D assets from its image catalog. While the text-to-image segment is maturing quickly and already has a lot of market momentum, text-to-video, and text-to-3D are segments still in their infancy. This may be a market where Nvidia could take the lead as opposed to playing catch-up with existing services and models.

Another noteworthy point is that you will hear Nvidia and its partners use the term “responsible content attribution.” This speaks to the controversial topic of attribution to artists that created images used for AI model training. Nvidia is in the early stages of developing new models with Adobe that support sourcing contributions by artists. However, this does not currently contemplate directly sourcing an image output to an image used in training.

Share this Synthedia post with your network. Help us spread knowledge and karma.

Share

The Getty approach is the most likely model for these programs currently in development. Getty is only using licensed content for training its foundation model. While the company will not be able to match a generated image directly with source images to indicate provenance, it will know the entire set of images that could have influenced the production.

NeMo (aka Megatron!)

Nvidia’s announcement about its NeMo text-to-text LLM service summarizes the solution as:

The NeMo cloud service that enables developers to make large language models (LLMs) more relevant for businesses by defining areas of focus, adding domain-specific knowledge and teaching functional skills … Models of varying sizes — from 8 billion to 530 billion parameters — available on the service are regularly updated with additional training data.

Nvidia’s Briski told me that the NeMo SDK is identical to the open source models that the company has previously published. Though Nvidia may contest the characterization that the NeMo service is competitive with OpenAI or other generative AI services, it is clearly a direct substitute.

Granted, Briski added that Nvidia is only focused on business-to-business applications, and “Most of our customers that were are working with want the flexibility to go on-prem. So, the ability to do a flexible deployment and run anywhere is the niche we are trying to hit.” An on-premises deployment is not an option today for OpenAI’s models.

Briski also suggested that users can already access the foundational open source NeMo Megatron models today, so this isn’t exactly an entirely new product. The announcement is around the service packaging that Nvidia has created to provide additional value for developers.

NeMo is offering three model sizes today, known as small, medium, and large. These equate to 8 billion, 43 billion, and 530 billion parameter models. All were trained on over 1 trillion tokens—a far larger data set than GPT-3—and the large model has undergone additional fine-tuning.

The standard context window for these models is 2k data tokens with an option of 4k. The models have been tested with 8k context windows, and a new feature will enable larger windows after additional testing is completed. This feature is less than GPT-4’s 32k maximum data token context window and half of GPT-3.5 and the GPT-4 standard model of 8k.

Many people are unaware that the API for GPT-4, with the largest context window of 32k, is double the cost of using the 8k window and 3-6 times higher than the GPT-3.5 Davinci model. NeMo pricing is not currently available.

The Plot (and Competition) Thickens

There is a lot of focus on the battle between OpenAI + Microsft and Google. And there is also interest in the independent LLMs and text-to-image models and how they will compete with the tech giants. Nvidia just indirectly announced it is competing directly with all of these core technology providers and offering a packaged alternative for developers looking to create and operate generative AI applications.

Share

Synthedia
Google Lays Out Generative AI Challenge to OpenAI and Microsoft
OpenAI announced GPT-3 in May 2020 and officially introduced its developer beta program two months later. This week, Google announced the PaLM API and the new MakerSuite. The latter is “a tool that lets developers start prototyping quickly and easily,” according to a…
Read more
6 months ago · Bret Kinsella
Synthedia
Microsoft 365 Copilot Exceeds Expectations and Shows What Lies Beyond GPT-4
Read more
6 months ago · 2 likes · Bret Kinsella
Synthedia
GPT-4 is Better Than GPT-3.5 - Here Are Some Key Differences
TLDR; OpenAI announced GPT-4 today and it is available to a limited set of API developers and to ChatGPT Plus subscribers. GPT-4’s multimodal features only include image inputs at this point and they are only available to developers through the API…
Read more
6 months ago · 4 likes · 1 comment · Bret Kinsella
7
Share this post

Nvidia is Becoming the Giant of Generative AI

synthedia.substack.com
Share
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Bret Kinsella
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing