Adobe Firefly is Generative AI for Designers and Maybe for the Rest of Us

Firefly adds video editing via natural language text prompts

Apr 18, 2023

Midjourney, Stable Diffusion, and DALL-E are great for generating new images. But what if you want to make changes?

You then need to import the image into an editing tool and know how to get just the right lighting, adjust or remove artifacts, and so on. Some artists I know will use three or four tools to make minor adjustments before finalizing the work. If you are a designer using text-to-image generators as part of your work, you probably do this daily.

In addition, AI models with natural language inputs can help you generate images, but they are not as deft at making changes. They certainly offer less iterative design control than designers are accustomed to.

Adobe knows its customer base. Designers need tools to refine images, audio, and video. Adobe software such as Illustrator, Photoshop, and Premier each have hundreds of discrete control features. It is not always convenient to hunt through menus to find the feature you need. Even if you know where the feature is located in the UI, you may not have the expertise to use it. And, of course, you probably don’t know about many of the features.

Natural language inputs address all of these problems. Changing the lighting to a “cold morning” or the “golden hour” with text is going to be faster and might deliver better results for a lot of users. Bringing this to audio and video might have an even greater impact.

Creativity Fueled by Natural Language

The beta of Adobe Firefly features debuted last month at the company’s annual user conference. The first features included a text-to-image generator similar to what you might expect from OpenAI’s DALL-E or Stability AI’s Stable Diffusion. To be clear, FireFly is not a new Adobe product. It is the brand name of several AI models that Adobe developed to support generative AI-based features in its existing products.

Earlier today, Adobe announced that audio and video controls based on generative AI would be rolling out later this year for Premiere Pro and After Effects in what is being called Adobe FireFly for Video.

We are entering a new era where generative AI will enable a natural conversation between creator and computer — where typing in your own words and simple gestures will combine with the best of professional creative application workflows to enable new creative expression.
…
Last month we announced Adobe Firefly, the next major evolution of AI-driven creativity and productivity. Firefly is our family of creative generative AI models, starting with image generation and text effects.
…
At NAB, we are expanding the vision for Firefly to imagine ways we can bring generative AI into Adobe’s video, audio, animation and motion graphics design apps.

Making Power Tools Easier to Use

Generative AI’s creation ability has driven most of the early interest in the technology. Some of the Firefly models are in this category. The next phase of adoption is already emerging. It is about using natural language as an easier method of executing tasks and accessing advanced features without requiring advanced knowledge.

This could be particularly useful for Adobe. Its Creative Suite tools are often viewed as intimidating because they have so many features. They are for power users and other brave souls willing to train themselves on complex features and nomenclature. That is one reason tools like Figma and Canva can accumulate so much market share from Adobe. They do not have all of the features and fine-grained control, but they are more approachable for everyday users below the expert level.

With this in mind, Adobe uses the copilot term that Microsoft favors. Using text to activate commands to edit video, audio, and images can make power users more efficient and novices more capable. The copilot is your assistant within a productivity application. The more complex the input, output, or control options of a solution, the more value a copilot will provide.

The video at the top of this post includes a couple of fantastic use cases. One shows how you can simply ask for a soundtrack relevant to the video footage, while another shows how you can find and insert b-roll footage (i.e., show activities beyond the person talking) related to what a person is saying. If this works—of course, marketing claims are not everyday reliable performance, but I will suspend disbelief for a moment—it will be incredibly useful and reduce both the project time and cognitive load on designers.

Generative AI is not all about creation. Sometimes, it’s just about enabling humans to talk to computers to make tasks easier. This is another reason large language models (LLM) generate so much excitement and have the potential to impact just about everything.