This One Factor Could Kill Bing Chat. Google SGE and Perplexity AI are in a Different League.
How long will you wait for a search answer?
Google dominated its early search rivals by delivering higher-quality search results more frequently. However, it was also maniacally focused on speed. That spartan Google search splash page was a sharp contrast to the cluttered portal concept of Yahoo, AltaVista, and the other search giants of the day (and even some of what you see from Bing today).
Another benefit of that simple page with no images was faster load time. This made a difference for many users in a bandwidth-limited web era. Google also spent a lot of time optimizing the speed of its algorithm and investing in making its search engine the fastest. From the fast-loading webpage to quick results, Google optimized the time from thought to “blue link.”
Google has always been about saving time for the user. Microsoft has focused on different objectives. Bing Chat is Microsoft’s latest and maybe the most credible attempt in over a decade to capture some of Google’s search market share. The initiative is led by a new conversational search experience powered by OpenAI’s GPT-4. The quality of the output is often very good. The wait time is not.
How Long Will Users Wait?
You can see from the chart above that Bing Chat’s response latency was nowhere near that of generative search rivals on June 12, 2023. It was over three times longer than Perplexity AI and four times that of Google Search Generative Experience (SGE). Surprisingly, Bing Chat cut the time from entering a prompt to a response by about 50% in the subsequent week from out initial test. In fact, we held off on publishing the initial results to verify that Bing was truly faster.
While this is a great development for Microsoft and almost certainly involved some engineering prowess to pull it off, the performance as of June 19, 2023, was still about double Perplexity AI and three times as much as Google SGE. In an era of instant gratification, consumers may wait longer for higher quality or new types of experiences, but this difference requires a lot of tolerance.
More Waiting for Fewer Words
Another interesting finding is that the additional time it takes to produce a result does not indicate a greater output quantity. The analysis from Voicebot.ai and Synthedia found that the June 12th version of Bing Chat produced a response with an average of 82.0 words and 3.2 words per second of wait time. The June 19th version reduced the output figure to 63.6 and a slightly better 4.9 words per second of wait time.
By contrast, Perplexity AI delivered the most comprehensive responses at 163.2 words. Google SGE generated answers with an average of 101.6 words. This led to words per wait time figures of 22.2 and 23.1 for Peplexity and Google SGE, respectively.
If Microsoft hopes to convert Google search users to Bing Chat, this is going to be a significant barrier. It is already a challenge to induce a user to try Bing if their search habits are Google-first. And this challenge is compounded by Bing Chat’s requirement to use the Edge browser on a desktop or a dedicated mobile app. Adding wait time on top of this will make converting Google search users even harder.
Consumers interested in a Google search alternative also have Perplexity AI as an option. It is almost as fast as Google SGE. It delivers more comprehensive answers than either existing search engine, and it is available on a variety of desktop browsers as well as through mobile apps. Perplexity AI is popular enough already that it has a rapidly growing subscription product and no ads for free or paid users.
Bing Chat’s Achilles Heel is GPT-4 Inference
Bing Chat is significantly faster than ChatGPT used with the Bing plugin. Bing’s plugin offers ChatGPT Plus subscribers real-time internet access suitable for search. It produces more words per response than any of the other generative search options at 253.1. However, it is so slow to complete a response that it only delivers 2.7 words per second of wait time. The wait time average per completed response was over 90 seconds. That may be permissible for some types of queries, but it is not a viable experience for most common search tasks.
The process of ChatGPT interpreting a prompt, using Bing to search to identify sources, and ingesting and summarizing multiple articles before responding takes a long time. Given the long latency for sometimes very short answers, it calls into question the viability of the plugin experience.
GPT-4 inference time appears to be the biggest factor in the latency problem. ChatGPT can answer some of the questions in the test because they refer to information or events that took place before September 2021. We found that ChatGPT with the GPT-4 model had an average response time of 41.3 seconds for those questions. That falls between the Bing Chat and ChatGPT with the Bing plugin times. However, ChatGPT with the GPT-3.5 model delivered answers in an average of 6.1 seconds with longer answers than Perplexity and Google SGE.
This is not exactly an exact-match comparison, but it does likely indicate the source of Bing Chat’s latency. OpenAI has not revealed much about the GPT-4 AI model size regarding parameters or training tokens, but it takes longer to generate an answer. GPT-3.5 is very fast and may be faster per word than alternatives. With that said, it is not a search engine, does not have real-time internet access, and does not provide source links.
It is fair to assume the response time for a GPT-3.5-based generative search by Bing would be longer than ChatGPT today when these factors are taken into account. However, it also seems likely the response latency would be materially better than the GPT-4 solution used for Bing Chat today.
Generative AI Search Adoption
Speed is not the only important factor for generative AI search adoption. The ability to get to an answer instead of a blue link and use conversational interactions to refine and extend the search are essential differentiators from traditional search. However, when comparing generative AI search solutions, the quality of the answers, the quality of the source links, and latency all matter a great deal.
Microsoft’s Bing Chat is an important innovation. Microsoft has work to do if the company expects to convert innovation into market share growth. Alternatives from Google and Perplexity provide a superior overall experience, and speed is a relevant factor.