Runway To Take on OpenAI's Sora with Enhanced Quality and Custom Models
Runway announced its new Gen-3 Alpha model that features more expressive faces, sharper video, and more control over style and motion than the company’s earlier generation models. However, Runway is not promising minute-long generative AI videos like OpenAI.
The upgrade by Runway arrives at an important time for the startup. OpenAI shocked many industry insiders with the quality, versatility, and duration of videos produced by its Sora text-to-video model. That immediately led to comparisons with Runway and competitor Pika Labs, which had been seen as leaders in the segment before Sora’s debut.
Runway’s CTO, Anastasis Germanidis, told VentureBeat, the Gen-3 Alpha model will be available “in the coming days,” first to paid users and then for free users. This means it will likely be in the hands of creators before Sora, which OpenAI has made available to only a few artists. According to Runway’s announcement:
Trained jointly on videos and images, Gen-3 Alpha will power Runway's Text to Video, Image to Video and Text to Image tools, existing control modes such as Motion Brush, Advanced Camera Controls, Director Mode as well as upcoming tools for more fine-grained control over structure, style, and motion.
Control Matters
VentureBeat’s interview with Germanidis also revealed some of Runway’s learnings since launching the Gen-1 and Gen-2 models. According to Germanidis:
When we first released Gen-2, you could only prompt it in a very simple manner with text, and we quickly added a lot of controls around camera motion, around object motion, and that proved to be really important for how people are using Gen-2 today.
So in Gen-3 Alpha, we basically invested a lot more time and resources in that, and spent a lot of time on the captioning of data that gets put in while doing training.
Now you can prompt it with really complex interactions, and you can prompt it with how the camera moves, in different styles, in the interaction of characters. So that’s a big focus, number one.
Video simply has more variables than images. That means incoherence is a bigger risk and users want more control over the output. That led to a number of new features to manipulate the video outputs and ultimately influenced the training for Gen-3.
Scale Matters
Germanidis also discussed the importance of scale in training data and the training process. Greater scale led to fewer errors and reduced the incidence of incoherence.
It’s been proven on the language domain how much scaling compute can lead to a much greater range of capabilities for models. And we’re seeing the same with video.
With increased compute we saw that the model was able to learn things like geometric consistency of objects and characters — not morphing as the video progresses over time, which has been an issue with previous video generation models. We learned the prompt adherence improves quite a bit as you scale.
Customization Matters
Another element of Runway’s announcement was easy to miss, but it may turn out to be very significant. The company will now customize its models for select customers.
As part of the family of Gen-3 models, we have been collaborating and partnering with leading entertainment and media organizations to create custom versions of Gen-3.
Customization of Gen-3 models allows for more stylistically controlled and consistent characters, and targets specific artistic and narrative requirements, among other features.
Companies can request fine-tuning or a custom-trained model. This has the potential to make Runway the most customer-centric of the text-to-video model developers.
OpenAI is unlikely to offer these features, and it may not even enable customers to do it on their own. Pika seems unlikely to offer these features as it is more focused on building out editing tools. The more likely rivals in this segment will be service providers that employ Stable Video Diffusion as their foundation model. However, they will not have the intimate knowledge of the text-to-video model that you find in research teams that created the foundation models.
It is unclear how big the market is for custom text-to-video AI models. If it turns out to be a significant priority to high-volume users, Runway may be in a position to realize the value that early investors were confident of.
Competing Against Giants
Runway’s biggest challenge is competition from OpenAI, which seemingly has unlimited data, unlimited computing capacity, and a large cadre of skilled AI research scientists. It is unlikely to win on technical merit alone. That makes its move to build more creator-friendly models, and tooling a smart move. As the video imagery improves, the features beyond the prompt may well determine the user’s choice of text-to-video engines.
This will be important as it continues to grow into its valuation from late 2022.
Many thanks to our title sponsor.