How to Make Your Own Deepfake Avatar - Tencent, D-ID, Synthesia and Other Options
Create your own personal deepfake in just a couple of minutes
Tencent has announced a new personalized deepfake creation service that costs just $145 and will provide your avatar in 24 hours.
Synthesia offers a service for $1,000 per avatar per year, and other companies, such as Hour One and Soul Machines, will do this for special projects.
D-ID enables you to create a deepfake digital twin video in just a couple of minutes, and for shorter videos, you can get your first produced for free. Scroll down to see an example.
Tencent announced on April 25th that you can now create your own deepfake “digital human” avatar for just $145 (about 1,000 yuan). According to The Register, it requires “three minutes of video, 100 sentences of speech, and 24 hours” to get your own personalized digital twin.
The digital characters are available in half bodies or full bodies, and the service is available in both Chinese and English.
Some aspects, like background and tone, are customizable. The videos avoid the flat intonation and single speech rhythm that plagues traditional acoustic models by using an in-house small-sample timbre customization technology that relies on deep learning acoustic models and neural network vocoders.
Chen Lei, general manager of Tencent Cloud Intelligent Digital Human Products, said the web colossus hopes to build an automated "AI+ Digital Intelligent Human Factory" and rely on a self-service one-stop platform for production, sales and service.
The Register seems to have accessed most of this information from an article in the Chinese news publication Jiemian. A translated version of that article included additional details:
Regarding the technical characteristics of Tencent Digital Human, Wang Chengjie, research director of Tencent Youtu Lab, said that behind the 2D small sample technology is 3D technology.
At present, Tencent Cloud Intelligent Digital Sapiens has covered five image styles: 3D realistic, 3D semi-realistic, 3D cartoon, 2D real person, and 2D cartoon. It can realize ultra-fine facial emotional expressions and hundreds of body movements, and supports image asset management and business Service configuration and content production related services.
The product is not yet available on the Tencent Cloud website for English. However, the $145 appears to be a one-time fee. This is a sharp contrast from companies like Synthesia that charge $1,000 to create a digital twin and then another $1,000 per year to maintain it.
Tencent appears to be going after social media influencers, small business owners, and professionals like “doctors and lawyers.” Synthesia is clearly targeting larger companies that want to create unique brand ambassadors or digital human twins of company executives that can be used for internal communications.
A Faster Way to Create Your Digital Twin
I have tested D-ID previously for this solution and thought I’d see how that service has evolved. So, how long does it take me to create a digital twin video using a photo and D-ID's Creative Reality Studio? The answer: about five minutes. However, that included locating an image and recording a short audio track.
Video generation after completing those tasks was less than a minute. I did this using free credits, so there was no cost to get started.
The result is lifelike, and the ability to add my own audio track is a plus for realism. It also supports color images, but I liked the look of the black and white a little better. Mouth movement was the only issue I encountered. The resulting video is not how I actually move my mouth, which is more symmetrical. The video seems to depict me speaking slightly from the side of my mouth. I suspect that is because the image is slightly angled, and the models apply mouth movements from a straight-on perspective. With that said, the match of the mouth movement to the words is very good.
My impression is positive, and you cannot beat the time it takes. It is faster than any other solution I have come across. Longer videos will require a monthly subscription of $6 to $300, depending on how many minutes you plan to produce, what watermarks you find acceptable, and whether you plan on commercial use. Once you create the videos, you can download the files, share the videos, or maintain them in the Creative Reality Studio library.
You can try creating your own deepfake here. If you do, include a link in the comments below so we can all see how it turned out.
Digital Twins for Celebrities and Media
D-ID also provides an API if you have a development team and want to create an interactive character. Other companies have also implemented a similar model for high-resolution 3-D-like animated characters but have more involved processes.
In the 2022 Synthedia events, Soul Machines featured NBA player Carmelo Anthony while YouTuber Dom Esposito showed off his digital twin created by Hour One. Marc Scarpa of Defiance Media also showed the world’s first digital news anchor. Creating these were more involved projects because they involved high-resolution video capture in a studio.
Dom and Carmelo are real people in the public sphere already. Roxanna from DeFiance Media was based on a model that is also a real person. If you are creating a deepfake or digital twin of a celebrity or for commercial purposes, taking the extra time and cost for these projects can often make sense. The offerings from Tencent and D-ID may be suitable for these use cases as well, but the lower cost and time commitment will make them practical for a wider variety of applications.
Deepfake and Digital Twin Adoption
I mentioned in yesterday’s post that people are using the same terms to mean different things and often misrepresent a term for convenience or out of ignorance. A deepfake can render the likeness of an individual in a video, as an interactive avatar, or simply make a real person look younger. A digital twin can replicate the likeness of a person as they are today or as a younger version of themselves. Right now, other companies doing the same thing will often say it is generative AI and not a deepfake. No wonder there is so much confusion.
One thing that does seem clear is that we will see a lot more of this. Lensa captured many people’s attention by creating heroic illustrated renderings of their likenesses as an image. Image-to-video solutions like Tencent and D-ID can take a Lensa image, animate it, add synthetic audio, and match up an audio file. You can also use a photograph, illustration, or the output from a text-to-image generator.
While yesterday’s post considered the many issues faced by media companies and celebrities and the need for tools to identify deepfakes, you can see that the technology is also becoming more accessible to everyday users. The overused term in technology circles about the democratization of access is, in fact, true in this instance. Two years ago, you needed to be a skilled developer to create a deepfake or digital twin. No code solutions exist today, and they are likely to proliferate.
Easy and low-cost access is sure to push up usage. The question is whether adoption will rise and if the novelty will solidify into sustained use cases outside the business segments. This is a critical assumption behind digital human market forecasts that suggest digital human revenue will grow from a few billion dollars today to $400-$600 billion within a decade.
Movie studios, musicians, and companies are already capturing clear benefits. How widespread will consumer use become? Let me know what you think. And drop a link to your own deepfake / digital twin in the comments if you decide to create one.