Stack Overflow Completes its Generative AI Reversal with an OpenAI Partnership
OpenAI wants data and Stack Overflows hopes to retain relevance
In December 2022, Stack Overflow banned the use of generative AI and specifically called out ChatGPT. The company this week announced a new “API partnership” with OpenAI According to the announcement:
OpenAI and Stack Overflow are coming together via OverflowAPI access to provide OpenAI users and customers with the accurate and vetted data foundation that AI tools need to quickly find a solution to a problem so that technologists can stay focused on priority tasks. OpenAI will also surface validated technical knowledge from Stack Overflow directly into ChatGPT….
As part of this collaboration:
OpenAI will utilize Stack Overflow’s OverflowAPI product and collaborate with Stack Overflow to improve model performance for developers who use their products. This integration will help OpenAI improve its AI models using enhanced content and feedback from the Stack Overflow community and provide attribution to the Stack Overflow community within ChatGPT to foster deeper engagement with content.
Stack Overflow will utilize OpenAI models as part of their development of OverflowAI and work with OpenAI to leverage insights from internal testing to maximize the performance of OpenAI models.
The benefit to OpenAI is clear. It wants access to data. Stack Overflow has a lot of technical data, including metadata based on user comments and upvotes. This will help OpenAI provide better answers in ChatGPT and train its models with more robust data.
Stack Overflow may receive some revenue from the deal, but no terms were disclosed. However, the deal ensures that Stack Overflow citations will start showing up in ChatGPT responses, and the company has a competent LLM provider powering its OverflowAI solution for developers. This latter element seems necessary for Stack Overflow to provide value to its users but may be insufficient to ensure its survival.
The Reversal
Stack Overflow executives were not alone in their strong negative reaction to the rise of ChatGPT. However, the company was somewhat unique in the technology community, particularly in the vehemence of opposition. A policy posted in December 2023 read:
All use of generative AI (e.g., ChatGPT1 and other LLMs) is banned when posting content on Stack Overflow.
This includes "asking" the question to an AI generator then copy-pasting its output as well as using an AI generator to "reword" your answers.
By April, the company was beginning to moderate its dogmatic approach, and in July 2023, the company announced OverflowAI. The AI train had left the station, and Stack Overflow saw that its influence was quickly being diminished by ChatGPT, GitHub Copilot, and other offerings.
If a generative AI assistant answers questions for you or writes the code based on a natural language request, what is the value of sifting through a pile of Stack Overflow posts and comments?
Consider what has happened to Stack Overflow since ChatGPT’s introduction on November 30, 2022. Its Google trends chart, which tracks organic searches, shows a steady downward decline. Stack Overflow searches are half today of the total at the onset of 2023.
Stack Overflow’s search traffic recovered briefly around the time it announced OverflowAI marking the reluctant embrace of generative AI. Since then, it continued a downward trend.
You can compare that with Quora, another crowdsourced knowledge platform that also saw an immediate decline in organic search traffic after ChatGPT’s loss. In the spring of 2023, Quora embraced generative AI and introduced Poe, a homegrown platform for accessing assistants based on various generative AI models and answering questions. That led to a strong rise in Quora searches.
Generative AI and Knowledge
Crowdsourced knowledge bases filled an important gap in the era that followed the rise of Web 2.0. However, their value today is less about providing information to users than providing data to AI foundation model builders. It may be that Stack Overflow can survive based on its community and collaboration solution. However, that is not guaranteed.
Software development knowledge bases turned out to be particularly vulnerable to the new code-generation solutions. It is not that the value is completely diminished. The reality is websites like Stack Overflow are less valuable today than two years ago, because new sources now exist within software development tooling to fulfill a similar need. It may be that Stack Overflow is about to fall below the critical mass of regular users and contributors it requires to remain relevant and remain in business.
The link with OpenAI follows an effort by the generative AI leader to amass higher-quality curated data sources to train future models. Deals with news organizations such as the Financial Times and AP look very similar to what appears to be the arrangement with Stack Overflow. The key difference is that the data is related to news rather than code.
Stack Overflow is valuable today because it has a significant back catalog of vetted information. If it doesn’t continue to add to that information set, the value may be fleeting. It will be interesting to see if anyone adopts OverflowAI when they can just get the information from ChatGPT, GitHub Copilot, or another pair programmer. Regardless, Stack Overflow needs to continue to nurture its community and extract information from developers in order to remain relevant.