One of the Creators of Tacotron 2 Discusses Lifelike Synthetic Voices and Hyper-Creation with AI
Altered AI was created to advance these objectives using generative AI
Ioannis “Yannis” Agiomyrgiannakis is the CEO of Altered AI, which offers hyper-automation and hyper-creation generative AI tools for synthetic speech production. This interview was conducted at the Sythedia 2 online conference and included demos of speech-to-speech and text-to-speech solutions.
Before founding Altered AI, Yannis spent nearly eight years as a research scientist at Google, working on the Tacotron 2 research and patent and other ground-breaking synthetic speech innovations. Tacotron 2 significantly advanced the quality of synthetic speech generated from text.
In our interview, Yannis discusses the advances and limits of text-to-speech and how speech-to-speech technology can leverage AI models to augment or modify the voice output and deliver humanlike performances. These features enables a single actor to convincingly perform multiple roles, speak in multiple languages, and take on a variety of accents. Yannis commented.
“Text-to-speech systems are limited natively by the limited representational capabilities of text. So, because text is limited, text-to-speech is also limited in what it can convey. When you want to make perfect human-computer interfaces and make a perfect machine algorithm that can represent a human, you need to solve this problem.
“So, speech, on the other hand, has emotional qualities and information inside [of it]. It is able to convey a performance. We are combining various versions of text-to-speech and speech-to-speech.”
Altered AI’s solution is essentially a voiceover studio where you can efficiently mix and experiment with multiple spoken audio performances. It enables you to capture, modify (i.e., alter), and produce audio tracks for games, entertainment, and other use cases.
Hyper-creation
Beyond advances in synthetic speech, a good reason to listen to Yannis’ interview is his explanation of hyper-creation and how it differs from hyper-automation. These points are often confused because, with generative AI, the hyper-creation concept is based on the rapid production of novel artifacts. This makes it look like automation. However, Yannis explains.
“One of the things I observed working with creatives, artists, actors is that they have the concept of an experiement inside their way of practicing. So, when you are doing an experiement, you are basically doing a trial. You are trying to figure out whether a particular set of hypotheses are working out for you or not.
“If you increase the experimentation rate — if you allow someone to make more experiments in the same amount of time or more complicated experiments — then we increase the productivity of the experimentation itself. And this is the hyper-creation. It is basically hyper-automation on things that you don’t really know how you want them.
“If you know what you want, that’s great. That’s hyper-automation. But if you don’t know what you want, if you are exploring, if you are in the exploration phase, whether you are a researcher in a lab or an actor trying out some ideas, or whether your are a script writer or any sort of creative, you can benefit from a faster experimentation rate. And that is hyper-creation. It’s hyper-automation in the creative ideas space.”
Generative AI’s biggest contribution so far is not the mindlessly repeated automation of a single, unaltered task. It enables greater productivity in experimentation by automating the production of creative works. These are tasks that have previously eluded automation, but tools with these benefits are suddenly abundant.