Synthetic media had a big year in 2022 even if not everyone knows it by that name. Search volume for “synthetic media” was was up and down worldwide with a notable spike in U.S. searches around the time of OpenAI’s InstructGPT announcement. In case you missed it, InstructGPT was the technology instrumental to the creation of ChatGPT’s conversational abilities.
However, “synthetic media” searches were superceded toward the end of the year by the term favored by venture capitalists, “generative AI.”
Granted, this term had more than a little help toward the end of the year which we will get to in a moment. It is important to note that synthetic media is a broader category than generative AI as it includes technologies beyond text and image generation such as virtual humans, voice clones, and deepfakes. We view generative AI as a subset of the category and is more of a tools oriented segment while the other categories tend toward richer experiences.
It’s All About the Products
What really boosted generative AI searches was the term’s association with four key products: three generative image models — DALL-E, Midjourney, and Stable Diffusion — and one new large language model consumer-ready application.
You can see that “generative AI” search volume — and, in turn, the term “synthetic media” — were rounding error compared to search for the products’ names. It’s easier to brand and create memorable associations with a product than a category!
There are some other interesting patterns in the search trends for the text-to-image models. First, the peaks for the successive product launches diminish for the U.S. searches (lower chart). DALL-E was the first to make the big splash and really show the broader public what a high quality generative AI image model can produce. It also is the flagship product in the category for OpenAI which is the best known of the three companies in the U.S.
From a U.S. search standpoint, DALL-E has also fallen behind Midjourney and Stable Diffusion which had much higher baseline searches in the second half of 2022. Midjourney has the benefit of being favored by many top artist, has an active Discord Community, and it was the first AI model to produce an output that won a real-world art contest. Stable Diffusion is an open source model with hundreds of companies and hobbyists are using it and promoting its use. Both of those models launched major versions and updates in the second half of the year.
The Worldwide figures show and even more interesting trend. First, Midjourney was far more popular worldwide than either DALL-E or Stable Diffusion. That means Midjourney has a lead in mindshare outside the U.S. for most of the year and recently surpassed OpenAI in the U.S. Midjourney was also headed for a second summit in search volume as of the end of the year.
It’s Even More About the Sizzle
However, there was one juggernaut that made all of these products search rounding error: ChatGPT. You can see in the charts below that ChatGPT dwarfed the other generative AI products in terms of search volume after it was launched.
ChatGPT search volume was 20 times higher than Midnourney and 50 times higher than DALL-E at its peak. Even at year end ChatGPT was eight times higher than Midjourney. One reason for this disparity is ChatGPT required zero learning curve. It was consumer-friendly when it launched. Everyone already knows how to use a chat thread. They also know how to search and how to interact with chatbots. ChatGPT seemed like a familiar and comfortable UI that produced an output well beyond expectations. It was a great combination.
The text-to-image models are not as consumer-friendly. In fact, the biggest growth area is likely to be in the marketing and design platforms such as Picsart and Canva. Both have integrated Stable Diffusion image generation as a feature and Picsart users quickly began generating over one million images per month.
Synthetic Voices, Clones, and Text-to-Speech
While there is some back-and-forth about the name of the synthetic media market overall, there is even more naming fragmentation in the synthetic speech engine market. The older terms, synthetic voice and synthetic speech, are not nearly as popular as text-to-speech, voice clone. The first two terms describe a technology, the third tells you what the technology does, and the final term is about the output achieved.
You can see the latter two terms are, by far, the most popular. Voice clone is the only term here that is consumer-friendly. The others are a bit dry but I could see synthetic voices becoming more popular over time.
What Should We Call Those Talking Avatars?
It is hard for an industry to take off if people don’t know what to call it. The talking avatars we increasingly see on websites, YouTube, TikTok, and in games are variously known as virtual humans, virtual beings, virtual people, digital people, to name a few terms. These “talking avatars” differ from avatars which are visual representations that must be animated by the actions real people. Digital people are animated by AI as you might hear Soul Machine’s Greg Cross comment.
Well, there is more nuance to this. Some of these characters are largely video capture solutions that render human-like performances while others are more like independent bots with personalities of their own. Regardless, the names used are varied even within some organizations.
Let’s set deepfake and voice clone aside for a moment. They were included because they are related to this segment and also for a point of reference. You can see from both a U.S. and worldwide perspective, the search term “digital people” far outstrips the other. This synthetic media segment may help itself by standardizing on this term since it has momentum.
Notably, this term also surpasses “voice clones” by a substantial margin. You will recall this is one of the top search terms in the synthetic speech category. However, even digital people is not anywhere close to deepfake. This is not an appropriate term for all of the virtual human / digital people category. Deepfake is a technology approach that is not even used for these applications.
There is a different point to be made. Some companies in the technology space view deepfake as a term with too much negative baggage. However, I think they are missing the larger story. Deepfake is a term that is already familiar to a lot of consuemrs. It will likely be easier to burnish the image of the deepfake term by showing positive exmaples of the technolgoy instead of trying to force feed the public a new term.
What will 2023 Bring?
Let me know what synthetic media search terms you used in 2022. Was it largely product and company names or did it include category terms too?
It will be interesting to see how this develops in 2023.