Axios reported that Google Assistant will soon gain generative AI capabilities. It will arrive first on smartphones. The change was announced through an internal email at Google which was shown to Axios. It also indicated that some layoffs will come due to the merging and reorganizing of several teams.
Synthedia learned earlier this year that Amazon has similar plans for bringing generative AI to Alexa. There are indications that this may come as early as mid-September, during Amazon’s annual product launch event. However, whether the new solution will be ready for widespread distribution is unclear. There are technical, user experience, and business impediments to making a quick switch.
For example, Alexa is optimized for quick interactions and has rules about how long it will wait for instructions and is designed to provide relatively short responses. Generative AI responses can take much longer to generate and then speaking them to the user would add more time.
Inference, the process for a large language model (LLM) to evaluate a request and make a response, typically has far higher latency than the existing NLU-based solution. Running LLMs is also more costly than NLU-based processing, which is a significant consideration for solutions operating at the scale of Alexa and Google Assistant.
I expect Amazon to announce and demonstrate an LLM-enabled Alexa in September, but its scope and/or availability may be narrow. They will not go full-LLM with it as they will maintain the original Alexa model behind the assistant, and the LLM will be additive.
Enter the Arbitrator
Both Google and Amazon are likely to use an arbitrator to enable generative AI in the two leading voice assistants. An arbitrator will consider each user request, determine where to send it for processing, and decide which model has priority when both can fulfill the request.
Some requests will go to the legacy assistant models (i.e., what we know today as Google Assistant and Alexa), while others will be routed to the LLM for processing. You can think about this as the voice assistant becoming a front-end for multiple backend solutions with a traffic manager between the interface and the models.
Google Assistant was once doing this to a degree. It identified certain intents that were routed to specific data sources. At one point, it would route requests to Google Actions (i.e., apps) for questions other Google services could clearly fulfill. That was removed several years ago. In each of these cases, it appears that one central model determined routing.
Adjusting to Reality
Google decided to announce in June of 2022 the timeline to end Assistant Actions. By November, Amazon was scaling back its Alexa team and ambitions. You could sense that both companies recognized they’d taken their shot and had some successes but largely did not meet their lofty visions. They had taken the market as far as the technology and consumer interest would allow. Except. They hadn’t.
ChatGPT launched the same month that Alexa was shedding staff. It was a new type of assistant that consumers couldn’t get enough of. Amazon has generated a lot of revenue (though maybe not a profit) from selling smart speakers. But the Alexa program struggled to generate even a few million dollars a year in incremental revenue.
Then ChatGPT comes along and amasses 100 million monthly active users in less than eight weeks. A month later, consumers are falling over themselves to pay $20 per month for access to GPT-4 models through the service and for better availability. Brad Gerstner, founder of Altimeter Capital, said off-handedly that ChatGPT has four million paying users. If true, that’s a billion dollars in annual revenue. The Alexa service may generate millions of dollars through the custom deployment of Alexa for enterprise applications, but not for the consumer version. At best, it’s a loss leader.
So generative AI is here, consumers like it, and the technology has also changed expectations. ChatGPT, however, cannot play your favorite song on Spotify (today), turn on your TV, or set a timer. Alexa and Google Assistant continue to offer utility. They also look tired, uninteresting, and based on yesterday’s technology. Google has a relatively straightforward path for adding Bard features to Google Assistant. Amazon will have to work harder for Alexa.
Adam Cheyer, Siri’s original creator, says assistants offer impact in two key categories: knowing and doing. Siri, Alexa, and Google Assistant all made strong progress in providing “doing” features but did not get very far in the “knowing” category. He says ChatGPT nailed the latter but still does not have (and it may be a long wait), much in the way of “doing” features. Maybe grafting generative AI features into the “doing assistants” will produce the first assistants that excel at both knowing and doing.