To start off, not all RAGs are of the same caliber. The accuracy of the content in the custom database is critical for solid outputs, but that isn’t the only variable. “It’s not just the quality of the content itself,” says Joel Hron, a global head of AI at Thomson Reuters. “It’s the quality of the search, and retrieval of the right content based on the question.” Mastering each step in the process is critical, since one misstep can throw the model completely off.
“Any lawyer who’s ever tried to use a natural language search within one of the research engines will see that there are often instances where semantic similarity leads you to completely irrelevant materials,” says Daniel Ho, a Stanford professor and senior fellow at the institute for Human-Centered AI. Ho’s research into AI legal tools that rely on RAG found a higher rate of mistakes in outputs than the companies building the models found.
Which brings us to the thorniest question in the discussion: how do you define hallucinations within a RAG implementation? Is it only when the chatbot generates a citation-less output and makes up information? Is it also when the tool may overlook relevant data or misinterpret aspects of a citation?
According to Lewis, hallucinations in a RAG system boil down to whether the output is consistent with what’s found by the model during data retrieval. Though, the Stanford research into AI tools for lawyers broadens this definition a bit by examining whether the output is grounded in the provided data as well as whether it’s factually correct—a high bar for legal professionals who are often parsing complicated cases and navigating complex hierarchies of precedent.
While a RAG system attuned to legal issues is clearly better at answering questions on case law than OpenAI’s ChatGPT or Google’s Gemini, it can still overlook the finer details and make random mistakes. All of the AI experts I spoke with emphasized the continued need for thoughtful, human interaction throughout the process to double check citations and verify the overall accuracy of the results.
Law is an area where there’s a lot of activity around RAG-based AI tools, but the process’s potential is not limited to a single, white collar job. “Take any profession or any business. You need to get answers that are anchored on real documents,” says Arredondo. “So, I think RAG is going to become the staple that is used across basically every professional application, at least in the near to mid-term.” Risk-averse executives seem excited about the prospect of using AI tools to better understand their proprietary data, without having to upload sensitive info to a standard, public chatbot.
It’s critical, though, for users to understand the limitations of these tools, and for AI-focused companies to refrain from overpromising the accuracy of their answers. Anyone using an AI tool should still avoid trusting the output entirely, and they should approach its answers with a healthy sense of skepticism even if the answer is improved through RAG.
“Hallucinations are here to stay,” says Ho. “We do not yet have ready ways to really eliminate hallucinations.” Even when RAG reduces the prevalence of errors, human judgment reigns paramount. And that’s no lie.
Services Marketplace – Listings, Bookings & Reviews