MultiOn's Nine Figure Valuation Highlights How Agents Can Enhance Assistants
ChatGPT and other knowing assistants will benefit from an agent upgrade
Synthedia’s regular readers will be familiar with the difference between knowing and doing assistants. ChatGPT, Gemini, Claude, and other generative AI-enabled assistants know many things. This enables them to fulfill knowledge-based tasks at a quality level that did not exist before 2022. By contrast, voice assistants such as Alexa, Google Assistant, and Siri can do a lot of things. They excel at tasks that control applications.
Siri and Viv Labs co-founder Adam Cheyer was the first to propose the knowing-doing divide among assistants. Synthedia has recently added a “connecting” capability to this framework. However, for today, let’s stick with the original knowing-doing segmentation, as these represent the capabilities users most want from digital assistants.
The divide exists because doing assistants have historically been lacking in the knowing department, and the newer knowing assistants can’t do much in the way of controlling applications. Agents may be the most effecting approach to filling this capability gap for knowing assistants, and ultimately enhance what doing assistants can accomplish.
MultiOn Agents
The Information reported that MultiOn will announce a $20 million funding round at a $100 million valuation. That is a rapid valuation rise just 12 months after the company was founded and less than six months after it announced its first product. MultiOn is not a foundation model company. Instead, it customizes open-source large language models (LLM) and small language models (SLM) to execute tasks that require application control.
General Catalyst is leading the round, with participation from Forerunner Ventures and Blitzscaling Ventures. Amazon participated in MultiOn’s 2023 seed round and has contributed to the most recent funding.
According to a person involved in the round, Amazon and MultiOn have held discussions in the past few months about whether MultiOn’s agent tech could make Alexa voice assistant more useful.
From Knowing to Doing
LLMs represent a probabilistic technology that is not necessarily suited to reliably executing tasks in applications. Controlling applications often requires specific steps and data formats to succeed. This is better aligned with deterministic than probabilistic approaches because consistency and repeatability matter. However, MultiOn and other agent development environments are training smaller models to navigate websites independently and, presumably, consistently. What makes AI agents different from procedural automation is that they can negotiate variability.
The strategy to use LLMs with agents is designed to combine the knowing and doing elements that create a well-rounded assistant. The LLM determines user intent and then calls on the AI agent most likely to fulfill the request. The AI agent then performs the “doing” request.
Granted, you could go straight to the specialty agent as a user, but that may mean you need a separate agent for every task or task category and would impose a significant cognitive load. Why? Users would need to remember every agent and what tasks they could perform. This may be acceptable for 5-10 different tasks. It is less practical if tasks reach into the dozens.
LLM-enabled assistants, particularly those backed by frontier AI foundation models, have proven versatile at fulfilling a broad range of knowing tasks for users. Many have also added tool-calling or function-calling capabilities. This approach leverages LLM strengths in determining user intent and then passing off the “doing” request to a tool or agent designed for that purpose. It is likely the fastest way for LLM-backed knowing assistants to accumulate broad access to doing skills.
It is also why a young startup can command a nine-figure valuation so soon after its founding and first product launch. The knowing-doing gap is a meaningful shortcoming that the popular knowing assistants have not yet addressed. Agents also provide the opportunity for traditional doing assistants that rely on coding predetermined procedural steps to adopt goal-seeking agents that figure out how to navigate applications to fulfill a request. If MultiOn can fill this gap for generative AI-based assistants and lands a customer like Amazon for Alexa in the doing assistant category, it is poised for rapid growth.
MultiOn already counts Amazon’s Alexa Fund as an investor. Synthedia also covered recent news that suggests Amazon is working on an Alexa upgrade that goes beyond the announcements from its fall 2023 product event. Adding agents, along with more LLM-enabled knowing features, is a logical strategy move by Amazon. Consider that the Alexa Fund’s Paul Berard commented in January 2024:
MultiOn’s AI Agent is impressive with its ability to connect with virtually any device or interface on the internet. We’re proud to be an investor in MultiOn and fully support their mission of simplifying customers’ lives through AI Agents.”
MultiOn Examples
MultiOn launched a playground in late May 2024 that provides several examples of agent capabilities. Some of these include adding a book to an Amazon shopping cart, looking up the weather on a website, reserving a table at a restaurant, and summarizing the news.
Many of these features existed in the earlier era doing assistants like Alexa and were target use cases for Google Duplex, which was originally demonstrated in 2018. In other words, the use cases are not entirely novel. The approach to fulfilling them is new and long anticipated … provided the agents work as advertised.
I add this note of caution because we have heard many stories about so-called AI agents in the past that turned out to be more brittle, less flexible, and less automated than promised. Google Duplex is a case in point. It’s use case fulfillment for restaurant reservations and salon appointment bookings often involved human intervention.
Rabbit Runs, Agents Negotiate
More recently, Rabbit announced the large action model (LAM) to tackle similar tasks as MultiOn is offering with agents. However, Rabbit appears to use procedural code and a rules engine to execute some basic tasks. Developers have not found evidence of an AI model other than OpenAI models. According to a recent Android Authority article:
Software developers @xyz3va, @schlizzawg, and @MarcelD505 have dug into the servers powering the Rabbit R1. One of the key findings is that the servers running the AI gadget’s interactions apparently aren’t running a Large Action Model (LAM) as previously claimed.
…@xyz3va scoured through the server’s code and claimed that “the LAM is also not an LAM.” She asserts that the company isn’t using AI to figure out app interactions but is instead using a “hardcoded list of locations” to tell the app where to click. This would contradict Rabbit’s claims that Rabbit OS and the Rabbit R1 use an LAM to handle interactions.
Other developers discovered in the source code that Rabbit OS is employing Playwright automation scripts. The software is designed to automate application testing, and it turns out it can also be used to control applications using hardcoded procedures.
The LAM is supposed to be a “super agent” that enables a lot of “doing” tasks in one package and does not require hard-coded scripts to control applications. The ambition of one model to navigate multiple use cases may be a key reason why it has not yet materialized. Or, it may be that the Rabbit team has not yet embarked on LAM development and took a shortcut to bring its first product to market.
Regardless, MultiOn agents take a more practical approach by limiting task scope. This significantly increases the likelihood that an AI model can execute a task that it has been trained on conceptually, even if it has not previously seen the application and its required steps. Narrowing the scope does not ensure success, but it does improve your odds.
Developers might consider what Alexa and Siri do for application control as an imperative programming approach. Rabbit, despite its original claims, is also using an imperative approach to task execution by employing Playwright scripts. The steps are defined and executed in a precise order that is expected to yield the objective. If the application changes, the scripts almost certainly need to change. This makes maintenance a challenge if the objective is to support many apps.
Agents like MultiOn are more similar to a declarative model. The objective is declared, and the solution determines the steps to fulfill the request. The most prominent example of declarative programming for digital assistants is Samsung’s Bixby, which was built on top of Viv Labs technology. The Alexa Conversations developer feature also took this approach. These weren’t agents per se, but MultiOn’s agent approach is a similar to a declarative model. The AI models powering the agents are tasked with determining the best way to achieve the objective.
What’s Next
AI Agents are a key tool that will augment the benefits users already capture from assistants powered by AI foundation models. Agents are the tools that will deliver task execution to knowing assistants and expand the capabilities of existing doing assistants.
MultiOn is among the first wave of AI agents and is benefitting from early market entry and high-profile investors. The AI agent segment will quickly become very crowded, and foundation model providers such as OpenAI and Google will also become competitors. As a result, it will be critical for MultiOn to scale rapidly. The funding round and the presence of Blitzscaling Ventures among its new investors look well-timed.
Thank you to Synthedia’s sponsors: