Amazon Alexa Gets Generative AI Makeover and Explores New Use Cases for Consumers
The future of Alexa is the LLM and not the Intent Model
“I often say that Alexa is the best personal AI out there, but I will tell you, it's always been a little bit more transactional than we would like, but that is a limit of the technology, not the vision. And thanks to our latest LLM, you can now have a near humanlike conversation with Alexa.” - Dave Limp, SVP of Amazon Devices
Dave Limp is SVP of Amazon Devices for a few more weeks. He has personally demonstrated many of Alexa’s latest features for years in Amazon’s annual product launch event and took the stage one last time to highlight how the Alexa experience changes when you add a large language model (LLM) to the mix.
The differences are subtle but noticeable to anyone that is familiar with the Alexa experience. Users will be able to activate the LLM-enabled experience (“Let’s Chat”) and continue to use the traditional Alexa experience.
Will this give new life to Alexa? That is unclear. However, it may be the first example of the fusion of a “knowing” and “doing” assistant. More on that below.
What’s Different?
A lot is going on in that demo, which may not be obvious. In addition, a couple of promo videos were shown in an earlier part of the conversation that revealed other changes to how Alexa operates in the “Let’s Chat” mode. Here are a few changes coming to your Alexa experience:
Open-ended conversations - Alexa’s “Let’s chat” feature enables conversations about any topic as it is connected to the internet, and the LLM can access several web services to answer questions.
Barge-in - You will be able to interrupt Alexa when speaking with it on an Echo Show device with a camera and the new adaptive context feature. The system considers both audible and visual cues to understand whether you are addressing the assistant, attempting to barge in, or just listening.
No wake word (sometimes) - When you are using “Let’s Chat” and looking at an Echo Show with a camera, it will be able to recognize you are making a request, even without saying “Alexa.”
Context Maintenance - It is not clear how long a conversation persists. However, the demo showed Limp stepping away from the conversation multiple times and then just picking up where he left off after making some side comments to the audience.
Personalization - This is not entirely new to the Alexa ecosystem, but it appears that the LLM-enabled solution has either adopted or extended this capability.
These features will roll out first in the U.S. Limp mentioned that all U.S. Alexa device users would have access to a “free preview.” That suggests it might not be free after the preview. There was no word on international availability.
How it Works
I spoke with Rohit Prasad, SVP and head scientist of artificial general intelligence at Amazon. He clarified to me that today, Let’s Chat accesses an LLM, and otherwise, Alexa operates in the same way as previously. When Let’s Chat is invoked by the user saying “Let’s Chat,” the LLM decides what services will be used to formulate and answer and generates the natural language response. In this case, the intent model is not in the loop.
He suggested that the intent-model Alexa and LLM Alexa will run in parallel for some time. However, he also indicated that he expects a single Alexa environment driven by the LLM at some point. Prasad also said the LLM used for Alexa is not Titan, the LLM offered by Amazon through AWS. It is a custom model.
Prasad said that grounding the model in truthfulness is achieved, in part, by using external data sources and retrieval augmented generation (RAG) vector databases. He also noted that Alexa has always had guardrails about some topics that Alexa will not chat about. He also said that the latency is somewhat longer when using the LLM, as would be expected. However, in my hands-on testing, I thought it was only marginally slower than the typical Alexa interaction.
Only a couple of Alexa skills can leverage this capability. One of the more notable solutions is Character.ai. The company has activated two of its characters in an Alexa skill and intends to add more. Volley, the leading game maker on Alexa and FireTV, was mentioned as building solutions, but none were on display at the event.
How it Impacts Developers
For Alexa skill developers, new features are available in a limited preview in Alexa Skills Kit (ASK). Alexa skills will be changing in the future, but I was unable to learn the details. You could anticipate the elimination of intents because Amazon intends to eventually phase out the intent model and centralize the solution on the LLM. For now, the LLM will just be treated as an API for skills that leverage the technology.
You can learn more about developing using the new LLM API here.
Knowing and Doing
To many people, the facility of ChatGPT with humanlike conversation made Alexa look like yesterday’s technology. It’s similar to the way Alexa’s capabilities made Siri look quaint and underpowered. Granted, the comparison between Alexa and Siri makes a lot more sense than Alexa and ChatGPT if you think about what they are designed to do and the scope of their features.
Alexa and Siri were essentially the same type of assistant. Adam Cheyer, the co-founder of Siri and Viv Labs (which became Samsung Bixby), calls these “doing” assistants. They complete tasks for the user. By contrast, he says ChatGPT is a “knowing assistant.” Yes, it can complete tasks. However, the tasks are all about what it knows and how it can apply that knowledge in different use cases. It doesn’t control your smart home.
If you missed my extended interview with Adam Cheyer on the Voicebot Podcast earlier this year, you might find his insights useful in putting the generative AI advances in context regarding assistant functionality. He hypothesizes that ChatGPT is amazing at the “knowing” part, but it will be years before it is capable of “doing” tasks. ChatGPT plugins are supposed to be the bridge to “doing.” However, if you see how that has progressed so far, you may conclude Adam and his Siri co-founder, Dag Kittlaus, who made similar comments in a separate interview, have a valid point.
Alexa has the “doing” part down and is now adding more sophisticated “knowing” features leveraging LLMs.
Creators vs Consumers
Limp also made an interesting comment about who generative AI applications are built for. He contends that most generative AI applications today are built for content creators and not for content consumers. Alexa is about consumers and everyday use cases.
“For many years now, we’ve been steadfast in our vision about ambient intelligence—a paradigm shift on how customers interact with technology around the. To make that vision a reality, we needed a superhuman assistant, one who’s there when you need it and disappears when you don’t and is always working on your behalf in the background. And we’ve been on that journey to create that superhuman assistant for more than a decade now. But with generative AI, it is now within reach.
“When it comes to AI today, the industry… [is] focused on how you apply generative AI to phones and browsers. Again, that makes a lot of sense because that is where customers are. But today, generative AI has been primarily focused on creators, not consumers. But when you are building an AI like this for the home, you have to think about it very, very differently.”
This is an interesting idea. Character.ai, A. (from SK Telecom), and a few other generative AI solutions are definitely for consumers. ChatGPT is clearly getting attention, but it does lean toward creation. However, aside from ChatGPT, the consumption solutions are not getting most of the headlines. The business use cases that drive productivity are more prominent.
Given its services and business model, Amazon’s focus on the consumption use cases makes sense. It also may point to where the market is headed. There are a lot more creation and knowledge management solutions to be deployed for business users, but the consumer or consumption use cases are truly in their infancy.