Generative AI Voice Clones - From Joe Rogan and Steve Jobs to Elon Musk and South Park with PlayHT
Make an expressive voice clone with only 10 seconds of audio
Voice clones are starting to show up everywhere. One place I was not expecting the hear them was in South Park. I did expect to see ChatGPT lampooned in the 2023 South Park season, but it had not occurred to me that a voice clone would also play an active role. Nor did I expect to hear a largely realistic podcast episode that never happened between Joe Rogan and Steve Jobs. Nor did anticipate that so many everyday consumers would want to create voice clones.
But, all three phenomena came to pass. PlayHT’s co-founders, Mahmoud Felfel and Hammad Syde, took the stage to discuss their technology and present a demo at the Synthedia 3 Generative AI Innovation Showcase. The demo included a fun scenario of trash-talking between Mark Zuckerberg and Elon Musk. You can access a video of the full presentation and fireside chat with me by clicking the image above.
Generative Creation and Efficiency
Felfel and Syed, also shared how their original synthetic voice model was trained on 50,000 hours of speech and their current model more than one million hours. That enables PlayHT to clone your voice in your native language or other languages. It can add an accent or remove one. And, a basic voice clone can be generated from just 10-20 seconds of speaker audio. According to Felfel:
"This model can clone voices from only 20 seconds of audio--10 seconds even, across of huge range of accents and languages and style of talk, and emotions of the voice and so on...We really think the future of media creation will be more generative than created by humans. Humans will not sit in a room anymore to record a voice...This will come to all modalities...You will get it done in a fraction of the time.
"A very clear example of this for us..was a couple of months ago, when South Park, they used [PlayHT] in one of their episodes. Traditionally, how would they do that? They would go hire voice over actors, have auditions, and get them in a studio to create the episode. It takes time and hours and that process takes more time and iteration and audio engineers and so on....But when they used us, it was only about one hour of creating the episode. It was a huge difference between those two approaches."
This was an intriguing conversation about the role of voice clones and synthetic voices in creating entertainment, extending the brand presence of online influencers, adding new interest to video games, and more. There are more use cases and more demand for these services than may be immediately obvious.