Meta (Facebook) Demos Text-to-Video

The Text-to-X Era is in Full Bloom

Sep 30, 2022

**Prompts // Top left:** A dog wearing a superhero cape flying through the sky. **Top right:** Hyper-realistic spaceship landing on Mars. **Bottom left:** An artist's brush painting on a canvas close up, highly detailed. **Bottom right:** A horse drinking water.

Meta (can we still just call them Facebook) just announced a text-to-video service called Make-a-Video. It is not generally available yet, but you can sign up for access and will be contacted as they open up slots for users. With that said, the outputs take what we see with text-to-image solutions to a new level. According to the company:

Make-A-Video research builds on the recent progress made in text-to-image generation technology built to enable text-to-video generation. The system uses images with descriptions to learn what the world looks like and how it is often described. It also uses unlabeled videos to learn how the world moves.

Instant Meme Generator

I’m assuming the first use of the core product will be to create memes in short video and gif formats. Sometimes you just want to show a visual to express a point. Text-to-image and text-to-video solutions may usher in a new wave of communication through symbols. Emojis, gifs, and now text-to-visuals.

Meta is surely cherry-picking some of the best outputs as there is no general access for others to test the solution. However, it is still impressive. In particular, the brevity of some of the prompts and the resulting output are noteworthy.

**Prompt** // a teddy bear painting a portrait

**Prompt** // A golden retriever eating ice cream on a beautiful tropical beach at sunset, high resolution

Add Motion to an Image

Make-a-Video is also venturing into mashup territory for video and images. I’m not sure if this will render properly in your email client, but you can click through to this post on the web to see it in action. You can take a still image and the solution will add motion to it.

We have seen variations on this in the still image space. D-ID can create an image map and pan and zoom around it. The company can also take an image of a person and animate the mouth and face and track it to natural speaking movements.

The Motionleap app has a tool that enables you to use your finger to “paint” areas where you want motion. You can do this with any image, including those you created using Motionleap’s own text-to-image solution, such as in the video below. With that said, Meta’s solution looks a lot more sophisticated.

Create Variations of Existing Videos

Another interesting feature is the ability to create variations of existing videos. Uploading the original video immediately below yields the variants further down the page.

Creators are always looking for new ways to leverage the content they already have. A solution of this sort could help them create more variation without significant additional effort.

Copyright Questions

It will be interesting to see how copyright issues play out in the text-to-video segment. Atomizing image elements and then reconstructing them with a GAN is already getting people thinking about how much source content is attributable to copyrighted material. Will video be even more perilous IP infringement territory? What happens when Disney sees something that it thinks looks like one of its assets?

This is something to monitor. Getty Images used legal uncertainty as an excuse to ban all AI-generated art from its platform. Others may follow this example though it is hard to see putting the text-to-X genie back in the bottle. The outputs of these systems are growing rapidly and are on course to become pervasive.

There will, no doubt, be a high-profile lawsuit claiming copyright infringement for an output of one of these AI solutions. Whether the output is deemed transformational will be one of the key questions in establishing a new legal framework for text-to-X. The other will be who is liable. Will it be the platform, the person that created and used the work, or both that are subject to legal liability?

Trust and Safety

Expect the rollout for Make-A-Video to be relatively slow and deliberate. According to Voicebot’s Eric Schwartz, Meta acknowledges the risk inherent in AI-generated images that were trained on a repository that included the internet. “The researchers aren’t shy about admitting to technical limits and inadvertently generating problematic videos because of the data scraped pulled from hundreds of thousands of hours of video, including plenty scraped from the web.”

Meta used to “move fast and break things,” but the company has become more cautious over the years as it has faced increased scrutiny over its practices and taken hits to its reputation. The company is unlikely to play fast-and-loose with AI-generated videos that may bring further social and regulatory pressure. Open source projects in the category will be the ones pushing the boundaries.

The Text-to-X Era

We now have text-to-speech, text-to-music, text-to-text, text-to-code, text-to-images, and text-to-video, and more. I’m sure this text-to-X trend will continue. Text-to-X is not the entirety of synthetic media, but it represents the segment with the most momentum today. And I suspect the acceleration will continue.

We hope you like Synthedia. Please share this post with someone you think will find it interesting.