At the Synthedia conference this week, I led off with an overview of the market. It starts with a definition and shows some examples. It also presents a new market framework, also known as a marketecture. I encourage you to watch the video to get a full review of the space. And there are some other comments below.
Segmenting Synthetic Media
Voicebot is currently developing a synthetic media market report and one of the concepts we introduce is the industry marketecture. This is designed to help you categorize companies based on their solution footprint and conduct more nuanced analysis of the market’s growth and trajectory.
We have categorized the market into six segments: text, image, audio, video, conversation, and characters. The presentation in the video lays out the first four of those as core and this 6-part segmentation as an enhancement.
Text: This includes automated text generators such as GPT-3 and other large language models.
Image: The AI-powered image generators ranging from Dall-E and Midjourney to Night Cafe, Stable Diffusion, and others which even includes TikTok.
Audio: These are the synthetic speech engine and voice clones you are familiar with such as text-to-speech and speech-to-speech systems. However, it also includes music and other sound generation technologies and tools to manage the AI-generated audio.
Video: This segment is generally populated by virtual beings which are referred to by many different names. It also includes deepfake face swap technology. A key distinction is that this media type is pre-recorded making it dynamic but not inherently interactive.
Conversation: Anyone familiar with chatbots and voice assistants will recognize this space. It is an interactive conversational feature.
Characters: Characters are the most complex of the synthetic media solutions as they are interactive virtual beings and draw on several of the other industry segments to deliver a robust user experience.
Each of these segments has other category attributes, such as a basis in visual or language-based (plus other sounds) media and whether the content is static, dynamic, or interactive. You can think of the dynamic attribute as a linear media that includes prerecorded virtual humans and/or synthetic voices. Interactive attributes indicate that different users are likely to engage in different experiences which will be based on what they do during the interaction. Importantly, the user has some control over the experience.
A New Marketecture
I hope you find this market framework (i.e. marketecture) useful. Let me know what you think. Where is it wrong? What is most useful about it? How would you change it?
In the video I show were some of the companies fit into the marketecture. This will surely evolve over time and we consider it a first draft. We welcome your input!