The news around deepfake technology outside of Hollywood has been almost universally negative. Most of the negative stories have been driven by fear of what deepfakes could lead to more than what has actually occurred. That’s not to say bad things haven’t occurred. It’s that the fear-to-reality ratio is unjustifiably skewed to the negative. It reminds me a bit of Brandon Kaplan’s talk about techno panic and the Luddites.
Hollywood is the exception. Use of deepfake technology in the Star Wars franchise has been well received, and Metaphysic’s performance on the America’s Got Talent (AGT) televised talent competition was enormously popular. However, many people see the negative stories or a humorous face-swap of a former U.S. president, and they think they know the whole story. Apple may be positioned to tell a new story with more compelling consumer use cases.
Apple’s Patent
The patent begins by saying the model is producing images that are not real but start off with a reference image of a real person (Patent# US 11475608 B2). That image, along with the trained model and the incorporation of prompts for an expression and pose, can create a new image that did not exist. The architecture is explained in Figure 3 of the patent.
(1) This disclosure relates to systems that generate simulated images of human faces.
(2) Machine learning techniques have been applied to generate synthetic still images and video sequences. These synthetic still images and video sequences may be photorealistic representations of real people, places, or things, but are not real images.
…
(8) In the systems and methods that are described herein, synthetic images of human faces are generated based on a reference image. The synthetic images can incorporate changes in facial expression and pose. At inference time, a single reference image can generate an image that looks like the person (i.e., the subject) of the reference image, but shows the face of the subject according to an expression and/or pose that the system or method has not previously seen. Thus, the generated image is a simulated image that appears to depict the subject of the reference image, but it is not actually a real image. As used herein a real image refers to a photographic image of a person that represents the person as they appeared at the time that the image was captured.
…
(11) The image generator is a trained machine learning model (e.g., neural network) that is configured to generate an image that looks like a realistic image of a human face, incorporates a face shape (e.g., including facial expression and pose) that is consistent with the face shape from the rendered version of the target face shape, and is consistent with the identity of the subject of the reference image (e.g., the person depicted in the generated image appears to be the same person as the subject of the reference image). The image generator is trained to constrain generation of the output image based on the input image such that the output image appears to depict the subject of the input image.
N.B.: All of the figures in the patent application can be found here.
Deep Nostalgia
This solution sounds a lot like the Deep Nostalgia product offered by MyHeritage and created by D-ID. However, that solution animates the image based on the way it looks. Apple’s solution purports to change facial expressions and poses. So, not only could you animate an old photograph, you may be able to smile or take on other expressions that were not represented in the image and may never have existed.
Is This Even Deepfake Technology?
There are a number of ways Apple’s patented technology is different from traditional deepfake technology. Those methods today begin with a video and produce an altered version of the video. Apple’s process begins with a photo and then animates it as a video.
From a practical standpoint, this means that someone needs to be in the video. It is usually an actor, and the face and other characteristics are swapped out to make it look like someone else (N.B. see examples below if you are not familiar with the technology). That requirement of human participation in the creation of the video is a downside. It doesn’t actually change the level of effort—in fact, the effort increases—and instead manipulates the visual output. Apple’s patent appears to eliminate that extra effort and simply manipulates existing assets.
What Are the Applications?
We already know of one potential application from the Deep Nostalgia example. Beyond that, this does seem a lot like a candidate technology for a photorealistic Memoji. And it doesn't have to be limited to the user. In theory, you could use any photo. This has great potential for celebrity meme generation.
Alas, Apple is very conservative about this type of thing. Don’t be surprised if there are use limitations. I suspect Apple has twin use cases in mind. The first is a photorealistic Memoji. The second is use for some sort of camera filter.
Popularizing Deepfake Technology
Apple is notoriously slow in commercializing new technology, so I am not expected to see this in 2023. However, if it does arrive in iPhones, it will have the biggest impact to date in exposing the masses to deepfake technology presented in a positive light. Then again, I am pretty sure Apple will never use the deepfake term. It is not in the patent.
I propose they call the Memoji example of this Phomoji. It all starts with a photo, and the sound has just the right connotation in French.
Deepfake Examples
Below are three Hollywood examples of deepfake technology starting with the legendary DeepTomCruise along with Star Wars and a Pulp Fiction parody.
Story credit goes out to Patently Apple for first identifying the patent award. You can read their article here.