Open Interpreter, Do Engines, and Using LLMs to Enable Actions
Do we really need voice interactive devices to access these features?
Open Interpreter is a growing open-source project that leverages the capabilities of large language models (LLMs) to control computing devices and applications. While LLMs are deft at navigating language-based interactions, including information retrieval, summarization, and text generation, they are not designed to control devices out of the box. Open Interpreter was created to combine the language capabilities of an LLM with a “do engine.”
You may recognize this concept from the January Rabbit device release. However, Open Interpreter has been an open-source project for at least eight months. It started to get more notice in the past two months, and a new announcement on Friday accelerated interest. There are many similarities but also some important differences between the ideas and a new product demo from Open Interpreter. Some key concepts to consider include:
Controlling Application with Your Voice - Demo
Large Action Models vs LLM Accessories
Doing Versus Knowing Assistants
Copyleft, Not Copyright
Key elements of Open Interpreter
The Orchestration Revolution
Likely Reactions from AI Leaders
Controlling Applications with Your Voice
The public aspect of the project began a couple of months ago, and the first releases were text-based capabilities. This week, Killian Lucas, the project’s lead developer, released a video on X highlighting voice interactivity. He demonstrated Open Interpreter 01, a handheld device that connects to the internet.
In the demo, which is added here because you cannot embed videos from Twitter, Luas shows off the capabilities of the software and the device. He highlights its ability to navigate through applications and complete tasks on a personal computer. Open Interpreter also enables users to teach it a “skill” to enable task execution in applications. The example demonstrated is creating a Slack message, though you would think a library of pre-built app integrations is imminent. You can also set up a custom routine that orchestrates an action based on an event.
Open Interpreter 01 is a small disc that enables voice input. It includes a speaker and can be run using your own server or by connecting with a service hosted by Open Interpreter. Lucas said in an X post that the presale allotment for the $99 devices was sold out in 2.5 hours.
The project also released information that will enable other builders to create their own version with common, off-the-shelf parts such as the ATOM Echo Smart Speaker Development Kit, battery, and silicon casing. The entire bill of material is estimated to cost less than $60.
It is an interesting choice to create a device to enable easier access to the action model. Today, there is no UI for desktop use or a mobile app. All of the demos from the last couple of weeks employ a command line interface. The open-source community behind Open Interpreter is likely to pursue a path of easy-to-use apps that leverage the software.
However, using it through an existing app, such as Slack or Discord, may be easier. It will almost certainly become a function call for the ChatGPT Assistant API. Interestingly, both Rabbit and Open Interpreter decided having their own interface to access the software was important. They clearly did not want to be beholden to the restrictions of the App Store or the popular mobile apps. With that said, I expect the devices are just part of each company’s market entry strategy. Both are likely to jettison the device approach as soon as the software secures traction.
Large Action Models vs LLM Accessories
Open Interpreter is not a large action model (LAM). Instead, it is software that works with LLMs to execute “actions” in other software. Or, as the open-source project says,
Open Interpreter lets language models run code…You can also build Open Interpreter into your applications.
Rabbit OS is explicitly defined as a foundation model that “understands human intentions.” It is unclear whether Rabbit developed a new model or took a base model and conducted additional training, though the “foundation model” terminology suggests that they intend to deploy a novel AI model.
The developers behind Open Interpreter are not taking that approach. It is software that is better thought of as an LLM accessory. Below is a video by Killian Lucas from February of last year, where he shows how to use the GPT-4 API to add items to a to-do list in Airtable. Open Interpreter is more sophisticated than this early demonstration, but you can see where Lucas got the idea around the time he founded Water AI.