Study Shows Decline in Google Search Quality and Reveals Path for Generative AI Adoption
The search for AGI will not solve this problem
A newly published study set out to evaluate the quality of search results delivered by Google, Bing, and DuckDuckGo over the course of a year. The focus of the study was over 7,000 product review studies repeated every two weeks.
In this paper, we systematically investigate for the first time whether and to which degree “Google is getting worse.” We focus on comparative product reviews that offer tests and purchase recommendations as a key indicator of search quality. Such reviews often contain affiliate product links, which refer customers to a seller. The referring entity (the “affiliate”) then receives a commission for clicks or purchases resulting from the referral. Affiliate marketing is essentially built on the trust of customers in the affiliate. However, since users often trust their search engines already, the affiliate inherits this trust as a byproduct of a high ranking. This creates a conflict of interest between affiliates, search providers, and users. With “relevance” being an imperfect metric, affiliates then turn to optimizing rankings instead of investing in high-quality reviews.
…
Our findings suggest that all search engines have significant problems with highly optimized (affiliate) content—more than is representative for the entire web according to a baseline retrieval system on the ClueWeb22. Focussing on the product review genre, we find that only a small portion of product reviews on the web uses affiliate marketing, but the majority of all search results do.
There is a high correlation between affiliate links and search engine results ranking and an inverse relationship with article quality. “The inverse is not necessarily true, meaning that our page features are not effective SEO exploits.” However, the correlation between low-quality content and high SEO rankings increased over the year of tracking.
Can We Generalize the Findings?
The report’s findings are compelling. However, it is fair to ask an obvious question. Assuming we believe the researchers’ conclusions about the erosion of product review quality, can we generalize that finding to other domains?
It is a fair critique to consider that product reviews are a fairly small percentage of total web content. The characteristics of websites packed with product reviews make them suitable for analysis at scale, and the profit motive ensures publishers will favor SEO techniques that drive traffic. That doesn’t mean they are representative of the average content. A smaller percentage of publishers may focus on SEO optimization in other topic domains. While this is almost certainly true of many domains, the results offer two important insights:
Motivated publishers that optimize for search results and clicks over content quality can achieve the top rankings in Google, Bing, and DuckDuckGo, despite what the search engine companies may say about filtering out website spam sites.
It does not require many publishers in any domain to take this SEO-first approach to push high-quality sites off the first two search engine results pages (SERP), virtually ensuring users will not find them.
The findings also confirm what many search users have concluded based on their anecdotal experience. Search quality is decreasing.
I have long contested that search biases too much toward recency, news sites, and big brands. This may align with some search users’ goals. However, if you want to discover information that is not recent and is not published by a major media outlet or company, you may find that it is virtually invisible. The first several pages of search results will not be what you want. And there are reasons for this beyond loopholes in search results filtering.
Incentives Align Around SEO Over Quality
There is little doubt that Google wants users to be satisfied with its search product. However, the optimal search experience conflicts with the most lucrative revenue model. If users don’t get what they need from the first site they visit after a Google search, they are likely to return to Google and try other results. Each return trip offers Google the chance to place more ads in front of users.
According to Gizmodo, a key search executive raised concerns about this conflict in internal emails that were submitted as evidence in an ongoing legal trial.
Lawyers for the government on Monday displayed a series of 2019 emails between then head of Google search Ben Gomes and colleagues where the executive expressed fears his team was “getting too involved with ads for the good of the product and company.” In later emails, Gomes said he was worried his team was “getting too close to the money.”
…
The documents point to a “Code Yellow” issued at the company for seven weeks following concerns it might fall short of its search revenue goals for the first quarter of 2019. Gomes, according to the documents, felt like he and others on the product side were tasked with focusing too much on revenue solutions.
Gomes reportedly sparred with Google over its decision to set its metrics on the total number of user queries. The former head of search reportedly balked at this metric because an improved search functionality should ideally prioritize answering users’ questions with as few clicks as possible. Google, the DOJ argued, benefits from users taking longer to search because the company can run ads against each of those queries. Around 80% of Google revenues reportedly come from advertising. If a user needs to refine their search a few times to get what they’re looking for, or if they have to scroll deeper through the results, more ads can be served to them.
What it Means
Trust is a perishable asset. It is hard to build and evaporates quickly when expectations are not met. This happens faster when there is a superior alternative and when users conclude the provider is not acting in good faith. Google is not merely facing a crisis because generative AI provides a better experience than traditional search. An answer does fulfill a user’s intent faster than links to websites that may or may not have reliable information. However, the rise of better search has coincided with a decline in the quality of traditional search.
Michael Spencer of
suggests that Google’s problems extend well beyond a decline in search quality. He contends that weak leadership has eroded the company’s commitment to commercializing innovation, which predictably has led to a critical talent exodus related to generative AI. Spencer wrote this week:Reports from inside of Google and many of those who have left point to a lack of visionary leadership talent at Google. Diane Hirsh Theriault, a software engineer at Google, criticized the tech giant’s leadership on LinkedIn earlier last week. The problem is, it’s so much worse than this in reality.
Sundar Pichai has to go. Despite record profits, a number of Google executives realize and are worried that the company is suffering from both its size and leadership from its C.E.O., Sundar Pichai…
The culture at Google has been stagnating and divided for quite some time. Advertising profits haven’t been funnuled into meaningful research or revolutionary products. Moonshots and talent retention have produced poor results in recent years…
Google is doing layoffs to set up for one last moonshot run at AGI, and they will fail. There’s a very small chance that AGI when it is discovered occurs at Alphabet at all.
This goal will be even harder to achieve when you consider that eight of the authors of the Google-sponsored “Attention is All You Need” research paper that kicked off the generative AI Cambrian explosion have left the company. Many more have also departed, and they lead some of the companies that Google’s AI team is trying to outcompete.
Google's generative AI search and chat solution, built on the LaMDA large language model (LLM), preceded ChatGPT. It was available to some early access testers (including me) and had a narrow topic domain, but that was by choice. I compared the two solutions in my first review of ChatGPT in December 2022.
Four weeks later, I wrote about Google’s Innovator’s Dilemma:
A little-covered story from CNBC in December hinted at Google’s rationale for its response or non-response to the ChatGPT phenomenon. The summary bullet points from the article were:
Google employees asked executives at an all-hands meeting whether the AI chatbot that’s going viral represents a “missed opportunity” for the company.
Google’s Jeff Dean said the company has much more “reputational risk” in providing wrong information and thus is moving “more conservatively than a small startup.”
CEO Sundar Pichai suggested that the company has chat products underway for 2023.
…
The Innovator’s Dilemma is a 1997 book by Clayton Christensen that describes why established market leaders often struggle to adjust when new technologies or behaviors begin to take hold in their industries. Despite having more knowledge and resources, they often cede market share to smaller competitors and sometimes lose their entire market advantage over time…
Incumbent market leaders have assets. They loathe undermining the value of those assets. Some of those assets are financial, others are intellectual property, others are customer and business relationships, and some are reputational. This means that incumbents believe they have less latitude when responding to market changes because they must balance the new opportunity without impairing the value they have already created.
…
Google has created a very profitable ad business on the back of its current search model. It is not clear if its current ad model would carry over directly to a conversational search model, but it likely would need to change, and maybe significantly. Financial implications invariably receive a lot of scrutiny and can further delay the introduction of new products that threaten the cash cow…The question today is how well Google will navigate its innovator’s dilemma.
The company’s failure to bring the generative AI search and chat was a choice. The ChatGPT phenomenon punctured that reluctance, and about four months later, Google released the Bard chat assistant. Shortly after that, the PaLM LLM was released, but only to select early-access users. Neither matched the quality of OpenAI’s products and Gemini, introduced a full year after ChatGPT and GPT-3.5, didn’t quite match the older model, much less GPT-4.
So it is with search. Microsoft Copilot (formerly Bing Chat) has not meaningfully eroded Google’s search dominance. However, it does offer better search quality, as does Perplexity. Google has its own Search Generative Experience, but the company has made no meaningful effort to promote users' transition to the new model. That stealth approach keeps users in their comfort zone of traditional search and maintains Google’s very profitable search business.
Habits are hard to change, but the status quo will not last forever. The sentiment that Google search quality is degrading now appears to be validated. Google may hope for AGI’s imminent arrival in a Mountain View lab, but it might find more immediate success and a more sustainable moat by maintaining leadership in search quality.
The famous “Google has no moat, and neither does OpenAI” memo from early 2023 suggested that Google was about to lose the race to AGI to open-source foundation model builders. That has not yet come to pass, though models such as Llama 2, Mistral, and MPT have impressed users. However, Google’s biggest threat is not OpenAI or open-source. Users will depart if Google fails to maintain or even re-establish a lead in search results quality. The researchers from Leipzig University, Bauhaus-Universität Weimar, and ScaDS.AI confirmed what we were all thinking. Google search must improve. Hope is not a strategy, nor is betting it all on AGI.