How Will Truth Prevail? New Methods Emerge for Truth Checking in ChatGPT and LLMs
This is a far more pressing problem the AI-content watermarking
“They will do no wrong. They will tell no lies.” Bible NIV, Zephaniah 3:13, cited directly.
“Ask me no questions, and I’ll tell you no fibs.” From The Rehearsal by George Villiers c.1672 as answered by ChatGPT ☹️
“Ask me no questions, and I’ll tell you no fibs.” From She Stoops to Conquer by Oliver Goldsmith, c.1773, cited by numerous online sources and the original text. This is what ChatGPT meant to source. 😀
“Them that ask no questions isn’t told a lie.” From A Smuggler’s Song by Rudyard Kipling c.1906, cited in the original text. 😀
“Ask me no questions, and I'll tell you no lies.” From the song of the same name written by David Saxon / Robert Wells performed by Bing Crosby c.1950, copyright Sony/ATV Music. ChatGPT failed to identify this. 🎤
Don’t ask me no questions, and I won’t tell you no lies.” From the song Don’t Ask Me No Questions by Gary Robert Rossington and Ronnie Van Zant c.1974, copyright Universal Music, also correctly mentioned by ChatGPT as a variant of the phrase. 🤘
“Ask me no questions, I'll tell you no lies." From the song Rotten Apple c.1995, cited incorrectly by ChatGPT. No reference to “questions” or “lies” are included in the lyrics. ☹️
While many people are worried they won’t be able to identify AI-generated content, the usual sign is the inclusion of factual errors. There is far more to gain by addressing the latter concern.
Let’s face it. The promulgation of falsehoods has always existed. It has just encountered scaling problems with the printing press, the internet, and now AI.
If large language models (LLM) cannot be deemed trustworthy in their content generation, adoption will slow, and the value of some AI-based solutions will be limited. That doesn’t mean generative text models have no value if they cannot be consistently correct. It is just that they will have more value if they are correct more often.
Confronting a Confederacy of Errors
There are errors printed in books, but we did not stop using books. There are errors on the internet, but we did not abandon it as a source of information sharing and research. We accept that humans make errors — sometimes accidentally and sometimes willfully — and take on some responsibility for verifying the information we consume. So it will be with AI with its mix of human and machine-attributed authorship. We will learn how to manage the trade-offs. We are also likely to have better tools very soon.
Keep in mind there are two different issues here that are often conflated by the humans talking about these topics. There are hallucinations that represent falsehoods concocted by the AI model and there are information gaps that naturally emerge from the information a model is trained on and how often it is updated to incorporate new information.
On the hallucination front, two of my recent guests on the Voicebot Podcast have addressed this question directly. One of them talked at length about a new “truth checker” AI model (now called CheckGPT) that companies can use to scan their GPT-3 generated content before it appears in a customer chat conversation.
In terms of information gaps, the Stanford Institute for Human-Centered Artificial Intelligence (HAI) has recently described three new models to address the problem of LLM-generated falsehoods. One of those may also help with the hallucination problem.
Hallucinations and information gaps are problems, but we might have workable solutions to address some of the shortcomings reasonably soon. Will these techniques deliver 100% accuracy? I expect that will arrive sometime after your Google search results and the world wide web are 100% accurate. So, the answer is no. However, practical usability is the real target.
The Hallucination Conundrum
Hallucination is the term of art that describes the phenomenon of large language models inventing facts or attributing facts to an incorrect origin. This can occur even if the falsehood does not exist in the training data. Thus the term hallucination.
Keep in mind we are not talking about differences of opinion. Recent moves by OpenAI suggesting the emergence of user-customizable versions of ChatGPT are designed to address the issues of opinion and bias. Factual errors, like the originator of a quote, are different. Granted, a quotation could be attributed to multiple people. Hallucinations are present when the LLM attributes the quotation to a person or work that it clearly should not.
Quotations, of course, are typically low-risk errors. They could rise to high risk if quotations were falsely attributed to a political figure, and a misunderstanding led to international tension. They could also rise to a high-risk category if erroneous medical advice was attributed to a leading healthcare figure. The level of risk determines how highly we should value accuracy.
Checking LLM Truthfulness at Runtime
We want to enable users to make them more efficient. And if you provide wrong information, of course, it goes against their brand. That is where we realized [a] truth checker is the thing which will be required. It’s not just a model which will tell you correct or incorrect. It is a suite of models which identifies if the information is presented in the right format…You need to verify if the information is right or wrong. In fact, that is the reason why, predominantely, everyone has stayed away from it. ChatGPT came out. Everyone was excited. Let’s do it. They did their analysis and came back and said, ‘not for voice agents’ [for customer support] because it needs to be accurate; it hallucinates. That’s where we came in. The idea is that to enable these generative LLMs to provide provide ChatGPT-like experience for enterprise, it needs to be accurate.
Got-It AI was working on a “truth checker” that they now call CheckGPT before the arrival of ChatGPT. The widespread interest in using GPT-3 for automating company responses to customer questions has led to “truth checking” rising quickly in importance. In my interview with Khatri, he talked about a suite of models that will be used to make LLM-generated content safe for enterprise use. He does not identify these directly but we can infer from his comments that they include:
Validating what the user requested
Reviewing the content generated by the LLM
Checking the generated content against databases with verified information
There could also be more “services” that get applied in the course of answering a question. You might wonder if it is worth it to run additional AI models and pay for all of the GPT-3 tokens that generate customer answers. Several companies are confident it will be because of the significant cost-offset compared to handling these requests with human agents. But, this is dependent on a minimization of false statements and avoiding hallucinations.
Reducing AI Model Information Gaps
The Stanford HAI research is focused on making the models more truthful over time by updating information. It considered three approaches.
MEND — This approach will update the base LLM when information needs to be updated, either because it is new or has changed. The model will be retrained to include the new data. One example cited is correctly identifying the current UK Prime Minister. This is equated to telling a student new information so they can cite that correctly in future conversations. The approach would require very frequently model retraining.
SERAC — Instead of retraining the model, the SERAC approach relies on creating a “notebook” of updated information that the model can access before it answers a question. This is similar to CheckGPT, though it does not necessarily occur at runtime. The notebook will be updated both proactively and reactively as errors are identified to improve the likelihood that the next query will be answered correctly.
ConCorRD — This is also a “notebook” style model, but it is more focused on internal consistency than factual accuracy. It tracks what the model has generated previously and can default to the most common answers. That can help prevent the model from providing different answers to the same question and, hopefully, default to true statements. This approach is more closely associated with addressing the hallucination problem.
With SERAC and ConCoRD, there is the benefit that the “notebooks” can become training data that are employed to retrain the base model from time to time. There presence between training sessions could also significantly reduce the cost and time associated with retraining.
All of the Above = Falsehood Mitigation
It seems obvious that none of these solutions alone will solve the problem of inadvertent falsehood generation. However, a cocktail of tools is likely to move the level of accuracy to a point where LLMs can be confidently employed in more use cases.
This is an important point. Some use cases are viable today, even with the current levels of accuracy/inaccuracy. As accuracy increases, more use cases will be viable targets for LLM-enabled features. The adoption of generative AI tools will be governed by a curve and not a binary equation.
You might also note that none of the solutions listed above directly address the problem of hallucinations. They address the outcome of the hallucination by attempting to mitigate the incidence and impact of false information. That is because the origin of the hallucination problem is not well understood. Identifying a problem exists is far different than understanding the root cause. For now, the approach will be mitigation as opposed to solving the problem.