Generative Text and Images Level Up Virtual Human Videos at Hour One
Helping users express themselves better
Using large language models (LLM) like GPT-3 has gained a foothold in the creation process. The tyranny of the blank page can be daunting when you have to express an idea effectively. Arnon Kahani, the head of engineering at Hour One, joined me recently for an interview about a new LLM feature in the company’s Reals automated video generation solution.
Hour One provides a gallery of virtual human presenters and scenes that can be selected with a couple of clicks. The user just needs to type in some text, and they have a presenter-led video ready to publish in a few minutes. However, until this week, they needed to take on that blank page challenge.
Script Wizard is a new feature that uses GPT-3 to write the script for the user or revise the written text. That revision feature enables users to shorten and lengthen the segment script, change the tone, or add details. Arnon Kahani commented about Script Wizard and the GPT-3 integration:
“Hour One is on a mission to enable anyone to create presenter-led videos. Our focus is on humanizing these videos, conveying messages better, and triggering an emotional response. Hour One has a platform called Reals, which enables users, with relative ease of use, to create videos by just inputting text, a few clicks, adding images, and getting a professional grade video…
“For Hour One we are always looking for new technologies to help our users to create better stories and narratives, and [we are] using GPT and large language models to bridge the gap from just an idea to a compelling story.”
Generative AI and Virtual Humans
An important Synthedia thesis is that the various technology segments within synthetic media and generative AI have value on their own. However, they are more powerful when combined and provide new value. Below is an example I created using Hour One's Reals virtual human video generator and the new Script Wizard feature. In this case, I started a short script about the company's announcement and used the GPT-3-powered feature to rewrite and extend the ideas in the text. I did this for multiple scenes, which comprise the one-minute video.
It is definitely easier to use GPT-3 to get started. Sometimes it does a good job of revising the script, and other times I prefer the original version. If you don’t like it, you can just regenerate and get another version in a few seconds. Most virtual human companies that provide presenter-led video generation strive for ease of use. This feature just takes that a step further. I suspect all of these solutions will have automated script-generation features sometime in 2023. It’s an obvious feature extension.
By the way, it wasn’t part of the announcement, but Hour One also recently added a Stable Diffusion feature to Reals. If you need an image to show up behind the virtual human presenter, you can upload one of your own, select from some stock photos, or generate something new using a text prompt. Kahani said:
I think the Stable Diffusions and the GPT-3s of this world will be the infrastructure of the generative AI space. They are the tools that need to be used within the platforms.
Once again, we see existing platforms simply integrating generative AI tools to make it easier for users during the creation process.
Helping People Better Express Themselves
There is some debate about whether generative AI truly expresses the ideas of the author. However, Kahani has a nice way of putting this. He just wants people to be able to express themselves more effectively. Video is one tool for doing that, while better storytelling is another. Whether they have an idea but are not skilled at creating a compelling story or if they need to take a good story and express it well in another language, these tools clearly offer tangible value for users.