What is Grok? Is X.ai's Chatbot for Twitter Really Better Than ChatGPT?
7 Things to Know about the New AI Assistant from Elon Musk
Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy, so intended to answer almost anything and, far harder, even suggest what questions to ask!
Grok is designed to answer questions with a bit of wit and has a rebellious streak, so please don’t use it if you hate humor!
A unique and fundamental advantage of Grok is that it has real-time knowledge of the world via the 𝕏 platform. It will also answer spicy questions that are rejected by most other AI systems.
Grok is still a very early beta product – the best we could do with 2 months of training – so expect it to improve rapidly with each passing week with your help.
Thank you,
the xAI Team
By creating and improving Grok, we aim to:
Gather feedback and ensure we are building AI tools that maximally benefit all of humanity. We believe that it is important to design AI tools that are useful to people of all backgrounds and political views. We also want empower our users with our AI tools, subject to the law. Our goal with Grok is to explore and demonstrate this approach in public.
Empower research and innovation: We want Grok to serve as a powerful research assistant for anyone, helping them to quickly access relevant information, process data, and come up with new ideas.
Elon Musk was a founder of OpenAI and famously separated from the organization when it transformed from a non-profit to a capped for-profit company. Earlier this year, Musk co-founded X.ai with the vague objective of “understanding the nature of the universe.” Musk indicated that the rise of OpenAI’s prominence as a for-profit company developing artificial general intelligence (AGI), combined with its market traction and the capabilities of GPT-4, motivated the creation of X.ai as an alternative.
1. Grok is an AI Assistant
Of course, Musk also acquired a social network a year ago, and this weekend X.ai introduced its first product for select X (formerly known as Twitter) users. The product is called Grok. It is a chatbot with direct parallels to ChatGPT. A key difference is that Grok has real-time access to X data, which presumably offers up-to-date and unique information for users.
Musk also refers to Grok as an AI assistant in another Tweet. Information on the new solution is still limited, but it is arriving quickly as some early access beta testers begin to publish their Grok results. Examples of real-time information sharing, answering technical questions, and writing Python code have emerged.
There is a sign-up page for the waiting list, and Musk indicated it will be available exclusively to Twitter Premium+ subscribers. Premium+ is a new subscription tier that will cost $16 per month for U.S. users through the web or $22 if purchased through the iOS App Store or Google Play. Grok is initially limited to U.S. users.
2. Grok is Based on a New LLM
Grok-0 is a 33B parameter large language model (LLM) and the predecessor of Grok-1. It is not specifically stated whether Grok-1 is also a 33B parameter model. Most people commenting on Grok today assume it is, but with the improved performance between the models, it may be larger. Grok-1 has an 8k data token context window (~ 6,000 words), double the original GPT-4 implementation in ChatGPT, but likely comparable to the current interaction of OpenAI’s chatbot. The model was trained on data “up to Q3 2023,” which likely means through the end of June 2023 or shortly thereafter.
After announcing xAI, we trained a prototype LLM (Grok-0) with 33 billion parameters. This early model approaches LLaMA 2 (70B) capabilities on standard LM benchmarks but uses only half of its training resources. In the last two months, we have made significant improvements in reasoning and coding capabilities leading up to Grok-1, a state-of-the-art language model that is significantly more powerful, achieving 63.2% on the HumanEval coding task and 73% on MMLU.
3. Grok is Better GPT-3.5 and Llama
Grok was benchmarked against a number of leading LLMs, including Llama 2, Inflection-1, PaLM 2, Claude 2, and OpenAI’s GPT-3.5 and GPT-4. There are over 200 different LLM benchmarking tests, so presenting four could mean the data was cherry-picked. However, GSM8k, MMLU, HumanEval, and MATH are some of the most popular. In this head-to-head comparison, X.ai claims that Grok-0 performs similarly to Llama 2 while using half the computing resources.
The company says Grok-1 bests GPT-3.5, is approaching the performance of Google’s PaLM 2, and remains materially behind Anthropic’s Claude 2 and GPT-4. According to the announcement:
On these benchmarks, Grok-1 displayed strong results, surpassing all other models in its compute class, including ChatGPT-3.5 and Inflection-1. It is only surpassed by models that were trained with a significantly larger amount of training data and compute resources like GPT-4. This showcases the rapid progress we are making at xAI in training LLMs with exceptional efficiency.
Since these benchmarks can be found on the web and we can’t rule out that our models were inadvertently trained on them, we hand-graded our model (and also Claude-2 and GPT-4) on the 2023 Hungarian national high school finals in mathematics, which was published at the end of May, after we collected our dataset. Grok passed the exam with a C (59%), while Claude-2 achieved the same grade (55%), and GPT-4 got a B with 68%. All models were evaluated at temperature 0.1 and the same prompt. It must be noted that we made no effort to tune for this evaluation. This experiment served as a “real-life” test on a dataset our model was never explicitly tuned for.
Since Grok-1 today is only slated to power the Grok assistant on X, these benchmarks may not matter much. Businesses are not evaluating whether to tap into its API as an alternative to OpenAI or Anthropic. Still, it is intended to demonstrate its capability to build confidence in the model and assistant.
4. Grok Supports the Super App Strategy
Musk’s strategy of embedding an AI assistant into a social media app is not novel. Mark Zuckerberg announced the same approach for the Meta AI assistants. It is a smart play for several reasons:
A contextual search and answer chatbot for an information platform, such as X, could become a valued benefit to some power users.
X is primarily a discovery solution today. An information assistant expands that role to become a search solution.
If you have a useful service or an assistant embedded in an app, you will likely engage with that app more frequently and with increased session duration.
If the objective is for X to become a Super App, like WeChat for the West, an assistant could be an invaluable solution for navigation and discovery of app services.
In addition, Grok puts X.ai on the generative AI map and offers a generative AI halo to Musk’s X. There are multiple potential benefits from launching the assistant, particularly if it gains a loyal following.
5. Grok Has a Personality
It appears that X.ai also differentiates Grok with its personality. You will recall that Siri, Alexa, and Google Assistant all attempted to create affinity and build affective trust by expressing friendly personas. Grok is adopting a similar strategy but is positioned as edgy.
This explicit persona-oriented differentiation runs counter to the approach of other Gen AI Assistants, such as ChatGPT and Bard. They explicitly downplay any distinct persona other than that of an efficient, emotionless assistant. The exception is, once again, Meta. While it has announced the persona-less Meta AI assistant, the company is also rolling out AI Characters that take on the personas of several famous celebrities. Since Meta and X lean more towards consumer use cases, they are betting that persona matters more than for the more business-oriented ChatGPT and Bard.
6. Grok is Not Multimodal
A key Grok gap compared to ChatGPT and Google Bard is the lack of multimodal capabilities. The company says it will introduce these features in the future.
Currently, Grok doesn’t have other senses, such as vision and audio. To better assist users, we will equip Grok with these different senses that can enable broader applications, including real-time interactions and assistance.
This should not be too much of a hindrance initially, but these features will be critical capabilities for personal AI assistants that achieve broad adoption. Being text-only will not be as big of a competitive deficiency if Grok does not aspire to be the primary personal AI assistant but instead focuses on being a use feature for X users.
7. Will Grok Be the ChatGPT Alternative?
ChatGPT dominates the Gen AI bot personal assistant landscape, with credible competitors in Claude 2 and Google Bard and other challengers, such as Poe and Inflection’s Pi. There are also applications such as Bing Chat and Perplexity.ai that fulfill many similar functions but are positioned as search-first. Grok is entering a crowded market.
It is also entering the market with only a paid version. ChatGPT Plus costs $20 per month, but there is a free version. Claude 2, Poe, and Perplexity also have paid plans. However, they also have freemium plans. What Grok has going for it is distribution through X, and its earliest adopters will be its current premium subscribers.
However, Grok will need to be much better than its competitors to build a sizeable user base with a $16 per month price tag. Otherwise, it is hard to see it becoming a widely adopted assistant solution. The success of services integrated into X (i.e., Twitter) will likely determine its fate more than raw assistant capabilities.
This is why I don’t expect Grok to have a meaningful impact on ChatGPT’s dominance in the near term. It is unclear what burning need Grok will address. There is no freemium plan at this point. Consumers are not accustomed to the search experience through X, nor do they have a history of using X for assistant functionality. X is primarily a lean-back experience. ChatGPT and the assistant category are lean-in experiences. It will not be easy to shift behavior and expectations. These strike me as significant adoption barriers.
Still, Grok could be wildly successful in terms of X’s strategy. There are reasons for consumers to choose Bard or Claude over ChatGPT, but they are task and user-specific. Most people won’t have those motivations. ChatGPT is about to add more features that Bard, Claude, and others are unlikely to match soon.
X has loyal users that may serve as the incubation base for the product. If X can layer on meaningful supper app services, it could then grow the value proposition and attract more paying users. It could also introduce a freemium version at some point to expand the user base and gather more user data, but it is unclear whether Musk wants to fund that expense. With all that said, Grok has a decent chance to emerge as the personal AI assistant alternative to ChatGPT, even though it would likely be a distant second.
What will be the alternative to ChatGPT as the copilot for your life? It could be Grok, even though the probability is fairly low. At the same time, Grok could simply become a useful generative AI-enabled feature with X. Why wouldn’t every application include an embedded assistant?
The assistant era is just getting started.