Google Gemini Looks Amazing in Video Demos, But Marketing Makes Another Unforced Error
The missteps undermine what looks like impressive technology
It’s been a year since Google CEO Sundar Pichai issued a “code red” in response to the launch of ChatGPT. That initiative cleared the way for several new products, ranging from the Bard and Duet AI assistants to the PaLM and Gemini large language models (LLM) to Codey, MedPaLM 2, and others. It also has featured some stumbles.
The Bard announcement was overshadowed by an inaccuracy in the chatbot responses that highlighted the generative AI assistant’s proclivity to generate falsehoods. This past week, a generally favorable reception of announcements around the new Gemini LLM was overshadowed by the editing in one of the videos. Bloomberg reported:
Google also admits that the video is edited. “For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity,” it states in its YouTube description. This means the time it took for each response was actually longer than in the video.
In reality, the demo also wasn’t carried out in real time or in voice. When asked about the video by Bloomberg Opinion, a Google spokesperson said it was made by “using still image frames from the footage, and prompting via text,” and they pointed to a site showing how others could interact with Gemini with photos of their hands, or of drawings or other objects. In other words, the voice in the demo was reading out human-made prompts they’d made to Gemini, and showing them still images. That’s quite different from what Google seemed to be suggesting: that a person could have a smooth voice conversation with Gemini as it watched and responded in real time to the world around it.
The video at the top of the page is indeed impressive. It shows off Gemini’s multimodal capabilities of speech recognition, conversational interaction, vision, and text generation. But is it true?
My guess is that many of those features have been demonstrated in the lab but with more latency, some errors, and likely not in the uninterrupted sequence depicted in the video. Once you start editing a video, it is hard to leave the blemishes, especially after the Bard debut fiasco. I even refrained from writing about that error because it could have been an honest mistake, and it was really a minor element. In many ways, I see the latest Google generative AI marketing controversy similarly. But…
Unforced Errors
It’s fair to say the video is misleading, maybe not in a legal sense, but definitely from a perception standpoint. That perception boomeranged from amazement to scorn within 24 hours. “Google Lied to Us” was the headline of two separate YouTube videos. Tech Crunch wrote, “Google’s best Gemini demo was faked.” BGR offered the title, “Google Gemini video that stunned the world is fake.” “Google faces controversy over edited Gemini AI demo video,” said CNBC.
This would not be such a big deal if Google didn’t have a history of marketing mishaps and unforced errors. I was present for the first public demonstration of Google Duplex. The performance of an assistant that could book appointments and restaurant reservations for you by independently calling the businesses seemed too good to be true. And it was.
A year after its launch, Google Duplex was using humans to book appointments. The company told The New York Times about one-third of calls included a human in the loop, and the rest were fully automated. Journalists testing the service suggested its results showed that 75% used humans.
Demo videos get edited all of the time, many without any disclosure. And ChatGPT also makes errors. While the criticism may seem unfair, the problem is Google. It has overpromised in the past and not been fully forthcoming. Ever since ChatGPT launched, Google has been rushing out media events shortly before
Everyone knows that Google’s early lead in generative AI evaporated due to misprioritization, bureaucracy, and lack of urgency. Is it hubris or just a pattern of mistakes? Google executives are acting as if they think the company is the leader in generative AI and desperately want everyone else to think that, too.
Today, Google is behind in generative AI products, market share, and mindshare. It might be able to leapfrog OpenAI and other rivals in technology and products. It could even close the gap in market share. However, Google is tone-deaf in reading the market and seems to be on a mission to self-sabotage on the mindshare front.
Amazon, which has far less to work with, looks like a better partner option precisely because they are not overpromising and underdelivering. At least so far. OpenAI and Microsoft are in a different league on the perception front right now, and Google has made it easy. With that said, I think Gemini’s LLMs might turn things around if the company can avoid more marketing mishaps.
Gemini Does Look Amazing
Indeed, several of Gemini’s features are already in ChatGPT. It is also true that Gemini likely has some features not in ChatGPT or any other rival. Check out the videos below. If Gemini can do these things, either within Bard or through custom solutions when the Ultra model is available, it will be significant. Keep in mind the following videos do not include the disclaimer of the video at the top.
Granted, three of these videos are in a section of Google’s YouTube channel called “The potential of Gemini.” Maybe that should give us pause. Is it the potential and not the reality of Gemini?
Google talked about the Gemini Pro model coming to Bard now and that it should be comparable or slightly better than GPT-3.5, the most commonly used OpenAI model today and available to free ChatGPT users. The expectation based on the marketing is that the higher-performing Ultra model will be available in Bard in January. But maybe these features aren’t going to be in Bard. They could only potentially be in Bard.
I’d like to write a story about how interesting these features could be and how they push the assistant field forward. But I can’t at this time. From now on, I will have to wait until Google’s features are available before passing judgment.