Analyzing RAG Failure Cases
Helps analyse failure causes in a RAG pipeline
By the end of this tutorial, you will be able to:
- Understand the failure cases in a RAG pipeline
- Perform Root Cause Analysis on your RAG pipeline
- Get actionable insights to improve your RAG pipeline
Let’s start by understanding what RAG is and how this tutorial will help you.
What is RAG?
RAG is the process of utilising external knowledge in your LLM-based application.
For example: Imagine you have a knowledge document outlining various scenarios for handling customer queries (question). With an LLM-powered bot at your disposal, the goal is to provide users with accurate responses based on the information in the knowledge document.
You can give an LLM relevant chunks of this information (retrieved context) to provide a better answer to user’s query. The LLM can utilize certain portion of this retrieved context to generate a response.
How will this tutorial help?
Let’s say you already have a RAG pipeline but you are not satisfied with the quality of responses you are getting.
Figuring out the root cause of this failure might be a bit difficult as RAG involves multiple steps and you would have to go through each step to figure out what went wrong.
Through this tutorial we will try to walk you an easy way to figure out the failure cases in your RAG pipeline. Let’s look at some major failure cases first:
Failure Case | Explanation | Example |
---|---|---|
Poor Context Utilization | The citations from the context are irrelevant to a user’s query | The LLM cites information on offers rather than refunds from a context containing information on both refunds and offers, for a question “Can I get a refund?” |
Poor Retrieval | The context given to an LLM does not have information relevant to the question | The user asks “Do you deliver to Bangalore?” but the context does not have any information to deliveries in Bangalore |
Hallucinations | The generated response is not supported by information present in the context | The LLM generates a response “We deliver to Bangalore” when the information present in the context is: “We are going to start delivers in Bangalore soon” |
Poor Citation | The response generated can not be verified with the citation | The LLM cites “We deliver to Delhi” from the context for a response saying “We deliver to Bangalore” |
Incomplete Question | The user’s question itself does not make sense | The user asks something like: “When delivery?”, “What location?” |
How does it Work?
Let’s jump to the code
Install UpTrain
Let's define a sample dataset to run evaluations
UpTrain uses these 4 parameters to perform RCA on your RAG pipeline:
Parameter | Explanation |
---|---|
question | This is the query asked by your user. |
context | This is the context that you pass to an LLM (retrieved context) |
response | The response generated by the LLM |
cited_context | The relevant portion of the retrieved context that the LLM cites to generate response. |
Perform failure analysis using UpTrain
Here we will be using an instance of EvalLLM
to perform RCA on your RAG pipeline.
You need an OpenAI key to generate evaluations using UpTrain.
Let's look at the results
Key information present in your results:
Parameter | Explanation |
---|---|
error_mode | The specific failure reason identified in your data |
error_resolution_suggestion | Actionable insights to improve your RAG pipeline |
Besides this the results also provide scores to different aspects of your data along with reasoning.
You can also look at our docs to know more about these evaluations.
Here’s a sample response:
This is the example of response generated on the the datapoint:
Here we can see that the user is asking about a specific delivery location but the LLM has cited irrelevant information on when FedL was established.
Hence the failure case is Poor Context Utilization
Was this page helpful?