6 Things Announced at Google I/O, One That Mattered A Lot, and One Mystery
Google is still catching up, but there are glimpses of significant innovation
Google I/O 2024 offered AI developers many futures and few presents. Most of the interesting AI-related features and applications are not available yet, only in the U.S. or part of a limited-access developer beta. But there were a lot of announcements and some provocative demos.
There are even several candidate technologies for the biggest potential for impact. However, because so much of the agenda focused on future applications and offered limited details, it is hard to say what will gain traction when they are ultimately released. Here are Synthedia’s top 6 announcements and one intriguing mystery.
1. Context Window Warriors
Gemini Pro 1.5 already has a one million token context window. Google’s Sissie Hsaio characterized that in terms of a 1500-page PDF or an hourlong video. That giant context window is coming to the Gemini Advanced personal assistant. The model will also receive an update later this year to a two million token context window.
It is unclear how much this feature will be employed, but it is unique, and Google mentioned it frequently during I/O. There are different schools of thought on long context windows. If you could just run everything in memory (i.e., the context window), you wouldn’t have to worry about retrieval augmented generation (RAG), data cleanup, or optimization, goes one theory.
However, large context windows add cost and latency while also showing significant item recall failure (i.e., inability to find information in the context window) at higher rates than vector databases. There are tradeoffs. Google’s other announcement about reduced token costs for multi-turn conversations with long context windows is a clear acknowledgment. Still, it’s a differentiated feature, so it made sense for Google to shine a bright spotlight on the industry’s leading context window.
2. Collecting Gems
Gemini Advanced users will also be able to create Gems, which appear to be Google’s version of GPTs. According to a Google AI blog post:
Simply describe what you want your Gem to do and how you want it to respond — like “you're my running coach, give me a daily running plan and be positive, upbeat and motivating.” Gemini will take those instructions and, with one click, enhance them to create a Gem that meets your specific needs.
OpenAI revealed earlier this week that ChatGPT Plus users have created more than one million GPTs since introducing the feature in November 2023. Google said this feature will arrive “soon.” However, a gif embedded in the blog post suggests Google may have quite a bit of catching up to do. The Gems example only includes the ability to describe how the user wants the Gem to act. For GPTs, users can add background descriptions, instructions, buttons with suggested starter prompts, and logos, and they can also create images and upload data to ground the responses.
Google may launch with these features, but their absence in the gif suggests it may have to work towards parity with GPTs over a few releases. With that said, it is unclear how much users value GPTs. They created a lot of interest and conversions to paid users at launch. However, there is no data on how frequently they are used and what percentage of the one million GPTs are merely experiments that were created and soon forgotten.
It makes sense for Google to add these to Gemini Advanced both for competitive parity and to enable uses additional customization options to their experience. There is no evidence so far that they will have a meaningful impact on user behavior, but customization often accompanies user loyalty.
3. Search Goes Generative (Present & Future)
It is clear that this could be the most important announcement, given that search is a very large industry, and Google is the dominant player. AI Overviews, Multistep Reasoning, Planning, and Video search were all introduced at I/O. While the final three are arriving in Search Labs for a limited trial either relatively “soon” or “later this year,” AI Overiews has already begun rolling out in the U.S. According to Google VP and head of search, Liz Reid:
People have already used AI Overviews billions of times through our experiment in Search Labs…With AI Overviews, people are visiting a greater diversity of websites for help with more complex questions. And we see that the links included in AI Overviews get more clicks than if the page had appeared as a traditional web listing for that query. As we expand this experience, we’ll continue to focus on sending valuable traffic to publishers and creators. As always, ads will continue to appear in dedicated slots throughout the page, with clear labeling to distinguish between organic and sponsored results.
4. AI Teammates in Workspace (Future)
Another candidate for the biggest announcement could be AI Teammates. If AI Teammates are actually agents with specific roles, expertise, experience, and memory, this could be a significant development. However, it doesn’t get the top spot because it is speculative and will not arrive until next year at the earliest. During the presentation on generative AI and Google Workspace, virtual teammates were introduced.
Now, as we look to 2025 and beyond, we're exploring entirely new ways of working with AI. Now, with Gemini, you have an AI-powered assistant always at your side. But what if you could expand how you interact with AI? For example, when we work with other people, we mention them in comments and docs, so we send them E-mail. We have group chats with them, et cetera.
And it's not just how we collaborate with each other, but we each have a specific role to play in the team. And as the team works together, we build a set of collective experiences and contexts to learn from each other. We have the combined set of skills to draw from when we need help. So how could we introduce AI into this mix and build on this shared expertise?
Well, here’s one way. We are prototyping a virtual Gemini powered teammate. This teammate has an identity and a Workspace account, along with a specific role and objective.
The demonstration went on to show how a teammate designed to keep track of project updates through emails, chat messages, planning systems, and so forth could answer team members' status questions or go into detail on specific issues at any time. Another role could be maintaining and updating a summary status for an event that is being planned across multiple chat rooms. This use case is similar to having an always-available assistant project manager.
Virtual teammates are a good idea. They will be customized AI assistants that can act as a shared resource. Note that Google suggests it is an experiment and it won’t arrive until 2025 at the earliest. There are startups working on similar concepts, so this is not entirely novel. but, at the same time, it is unproven. Google and Microsoft would seem likely to succeed with this type of offering if they can make them truly useful. They both have a lot of productivity software users and strong platforms to build from. We will come back to this next year.
5. A Flashier Model (Present)
Gemini 1.5 Flash is the smaller, faster version of Gemini Pro 1.5. Google writes, “This smaller Gemini model is optimized for narrower or high-frequency tasks where the speed of the model’s response time matters the most.”
The Flash model is also carries inference cost of one-tenth that of 1.5 Pro and its general purpose benchmark results are largely comparable. Flash generates more tokens per second than comparable models, and one technical analysis placed it in the mid-range of model performance—below GPT-4o, Claude 3 Opus, Gemini Pro 1.5, and Mistal 8x22B, but better than GPT-3.5 turbo Claude 3 Haiku and others in that category.
There are also some updated Gemma open-source smallish large language models (LLM). However, so little detail was offered on the new Gemma models that it is impractical to elaborate on their impact. Flash 1.5 is available today through Google Cloud’s Vertex AI, and it is likely to become the “good enough” model in the Gemini portfolio that best balances cost, speed, and quality.
Google’s Demis Hassabis commented, “While it’s a lighter weight model than 1.5 Pro, it’s highly capable of multimodal reasoning across vast amounts of information and delivers impressive quality for its size.”
6. Agents or Just a Better Gemini? (Future)
Project Astra is the name of the most interesting demo, but also the key feature that could truly set a digital assistant apart from the pack.
It is well understood that persistent memory is an important function of a personalized digital assistant. However, there are two other gaps to be bridged. The first is merging the knowing and doing assistants. ChatGPT and Gemini know things, while Siri and Google Assistant do things (i.e., execute tasks).
Note that Google has both sides of the equation in different applications. Its first big opportunity is to combine these functions that have heretofore been separate. There was a path to this in 2023 when Bard was added to Google Assistant, but it is unclear whether Gemini Advanced has any of Google Assistant’s task execution features. This gap looks like it will continue.
However, there is another significant opportunity. It involves AI agents. Demis Hassabis commented during the I/O keynote:
As part of Google DeepMind’s mission to build AI responsibly to benefit humanity, we’ve always wanted to develop universal AI agents that can be helpful in everyday life. That’s why today, we’re sharing our progress in building the future of AI assistants with Project Astra (advanced seeing and talking responsive agent).
To be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do — and take in and remember what it sees and hears to understand context and take action. It also needs to be proactive, teachable and personal, so users can talk to it naturally and without lag or delay.
…
With technology like this, it’s easy to envision a future where people could have an expert AI assistant by their side, through a phone or glasses. And some of these capabilities are coming to Google products, like the Gemini app and web experience, later this year.
As you move from a knowing to a doing assistant that can execute and add memory, the next level of value is an agent that does things of value even when you are not present. Project Astra doesn't appear to be an agent, just a more sophisticated version of Gemini Advanced. With that said, the demonstrations reflected a sophisticated multimodal-capable copilot.
A more practical demo appeared at the end of the developer keynote. A Google engineer had the Gemini assistant watch the main keynote with him in real time and then ask questions about the presentations. There were a couple of glitches, but it was impressive nonetheless. It appears similar to some features OpenAI introduced around interacting with desktop applications but with immediate reactions and feedback to a live stream. You can view it at this timestamp.
Google has offered some impressive demonstrations before that didn’t turn into products at all, and others that didn’t quite meet expectations. In this case, we did see some features that looked meaningful, and you can imagine that they are close to a beta release. Time will tell. If Project Astra meets Hassabis’ expectations, it will be the biggest impact announcement of Google I/O 2024.
Smart Glasses Mystery
One of the most intriguing elements of Google I/O was the quiet inclusion of smart glasses as part of the multimodal Project Astra demo. It begins at 1:20 in the video when the presenter asks Gemini if it has seen her glasses. After mentioning that they were on the desk, the presenter then put down her smartphone, collected the glasses, and put them on to continue the demo. We quickly figured out they are smart glasses as the vision features are employed to identify a whiteboard drawing, and Gemini offers advice on how to improve the solution architecture.
The interaction was brief and the glasses were only mentioned casually one more time at Google I/O. DeepMind founder and Google’s leading AI executive, Demis Hassabis, commented after the demo:
I think you'll agree it's amazing to see how far AI has come, especially when it comes to spatial understanding, video processing and memory. It’s easy to envisage a future where you can have an expert assistant by your side through your phone or new exciting form factors like glasses.
Google famously failed to succeed with the Glass smart glasses, shutting down the consumer version in 2020 and the enterprise edition in 2023. However, in June 2020, it also acquired smart glasses maker North and its many patents. What we saw in the video may be a new product built from the remnants of North. Or, it may be a product from a fellow member of the XR Alliance. Either way, Google is thinking about Gemini on smart glasses.
Meta is also betting on smart glasses with the Meta Raybans as distribution points for its AI assistant. Google has an advantage due to its Android smartphone control, but the combination of a competent voice assistant and smart glasses is a potentially potent combination for consumers.
What’s Next
It is no secret that Google has been playing catch-up. The “code red” issued by Google CEO Sundar Pichai in December 2022 was an acknowledgment that the company had to make up ground on the dynamic duo of OpenAI and Microsoft. Seventeen months later, Google has some very competent products and one notably unique generative AI feature. It may be emerging as the best-of-the-rest, but it is striking how it has not been able to close the gap more concretely.
The contrast between OpenAI and Google was made starker by the announcement of the GPT-4o model and the busload of new ChatGPT demos that debuted earlier in the week. OpenAI took a page out of Google’s PR book and invaded a planned media cycle by staging its event just prior to Google I/O. It worked. Sort of. It is fair to say that OpenAI received a lot of rave reviews and a few yawns. However, it deserves kudos for introducing first-to-market capabilities in its personal assistant and dramatically reducing GPT-4 inference costs.
Google I/O generated mixed reactions. Some people were energized and excited, while others thought it fell short of expectations. However, a big part of those reactions is that most people expect a lot from Google. Even if Google is still catching up in some areas, it has made a lot of progress and may be taking the lead in some specific segments. The area where I expect Google to close the gap fully over the next 18 months is for consumer and employee assistants. It will have a harder time meeting broader enterprise expectations, but there is now a path to success.