$90 million funding round
Valuation at $1 billion
50,000 customers
12 million videos
Investors include Accel and NVIDIA
Synthesia, the AI avatar video platform (and the company the closest spelling to Synthedia 😀), announced a $90 million series C funding round. The company has raised $157 million to date.
“The round, which values the company at $1 billion, is led by Accel with investment from NVentures, NVIDIA’s venture capital arm, and participation from existing investors Kleiner Perkins, GV, Firstmark Capital, Alex Wang, Olivier Pomel, and Amjad Masad.
With a year-over-year user growth rate of 456% and over 12 million videos generated on the platform to date, the company has consistently driven triple-digit growth while serving more than 50,000 businesses across the globe.
NVIDIA’s participation in the round is notable given its strong push in generative AI and historical focus on video rendering. Also, the 454% annual growth is a positive sign.
You can typically ignore large growth rate claims by startups when they do not reveal any nominal figures. It is easy to show strong growth when you are building from a very small base. However, Synthesia was already one of the leading companies in the space, so the growth of more than 5x suggests significant traction and recurring customers. This may also signal that broader industry growth is taking hold after several years of modest traction.
AI Avatars / Virtual Humans / Digital People
Synthesia is best known for its video production automation platform for training, sales, marketing, and customer service. The videos enable users to add a script and some images, select a virtual human avatar, and create a video with the avatar as host.
Companies ranging from Soul Machines and Hour One to Deepbrain and D-ID are focused on the fusion of “digital people” with educational and informational videos. These used to be called virtual humans, but more recently, companies have preferred the term “digital people” or the broader term, AI avatar. The former is a narrower classification that includes photorealistic human renderings. That latter category in Synthedia’s taxonomy can extend to non-human creatures, cartoons, or other stylized characters.
Are Digital People Generative AI
Everyone seems to agree that large language models (LLM) and text-to-image AI models are categories of generative AI. However, some are unsure whether “digital people” fall into this category or if the software companies just want to be attached to the latest and hottest technology trend. Synthedia’s view is that “digital people” are indeed a category of generative AI and synthetic media.
At a basic level, nearly all of the videos in this space use synthetic speech engines to generate the avatar voices. Text-to-speech is, by definition, a generative AI technology. In addition, the rendering of the avatar and its movements are often now the result of AI model training. More specifically, the process of matching the spoken script with the avatar's mouth movements and emotional expressions is another layer of AI. So, the generative AI label is appropriate.
Some software companies in the space, such as Hour One and D-ID, have also introduced generative text and image features by tapping into OpenAI GPT-3 APIs. In many ways, the AI avatar video platforms are where generative text, speech, images, and video converge.