From the Star Wars franchise to America’s Got Talent, Respeecher is getting noticed in Hollywood. It is a long way from Kyiv, Ukraine, to Los Angeles, but entertainment studios are finding Respeecher and putting the company to work. The result has been a few Emmy awards and global recognition.
Speech-to-Speech
Alex Serdiuk, Respeecher’s CEO, joined Voicebot’s Eris Schwartz at the Synthedia Conference to discuss the company’s voice cloning work. It is worth noting that Respeecher provides speech-to-speech technology. Many people are more familiar with text-to-speech (TTS). Using a TTS model, a voice is replicated by running recordings of a particular speaker through a specially trained deep learning model. Typed or generated text is then “read” by the speech engine, which says the words in the speaker’s synthetic voice. Amazon Alexa and Apple Siri are both examples of this TTS approach.
Speech-to-speech (STS) also replicates a voice via recordings processed through a deep learning model. However, the input is speech instead of text. A voice actor generally speaks the dialog or passage, and the STS engine transforms the utterances into the replicated speaker’s voice. The goal is higher fidelity voice recreations because the synthetically generated speech maintains the proper inflection, volume, and prosody of the spoken passage by mimicking what the voice actor said using the replicated voice. TTS solutions don’t have those inputs and, in a sense, “guess” how the words should be spoken.
This difference doesn’t matter all that much for voice assistants executing tasks on a consumer’s behalf. It makes a big difference in a theatrical performance. Thus, Hollywood has discovered some of the unique benefits of STS and Respeecher. There is a demostration example highlighting how this works just a couple of minutes into the interview.
Time and Language Translations
Alex also discusses other examples of STS in the interview with Eric. The famous Star Wars franchise example of a young Mark Hamill portraying Luke Skywalker went largely unnoticed until the series creator mentioned it in an interview.
Everyone seemed to know that the Mandalorian was using face-swap technology to de-age Hamill’s face. However, it would not have made sense for a twenty-something Luke Skywalker to have the voice of a 68-year-old Hamill. So a de-aged voice-swap from Respeecher was added to the face-swap. This was done again by Respeecher in the Obi-Wan Kenobi series to de-age the voice of James Earl Jones’s Darth Vadar.
For the young Elvis Presley on AGT, Respeecher followed a similar approach, but in that instance, it was resurrecting the voice of a deceased singer.
The video concludes with a musical performance by singer Aloe Blacc signing his popular Wake Me Up song in five different languages. He doesn’t know any of those languages, but watch the video to see and hear his multilingual voice clone. The technology is fun to see in production.
Alex also spends time talking about how the technology is being used to support war efforts in Ukraine by transforming celebrity messages into Ukrainian. Let me know what you think.