Validation
Overview: In this example, we will see how you can use UpTrain to ensure that your LLM responses are adequate before you use them to perform downstream tasks. A list of defined checks performs the validation. If the LLM’s response is invalid, UpTrain will keep retrying until the model returns a valid one. We will use a Q&A task as an example to highlight the same.
Why is validation Needed: LLMs are great, but they are not 100% reliable. Downstream tasks require the LLM response in a particular structure. Sometimes the response produced by the LLM deviates from the required format. This deviation causes all sorts of problems. LLMs can hallucinate randomly. We surely don’t want to show those results to our users. Hence, we have to run validation checks on our LLM responses, catch where they go wrong and retry the LLMs. This process repeats until the LLM output passes all the validation checks.
Problem: The workflow of our hypothetical Q&A application goes like this,
- User enters a question.
- The query converts to an embedding, and relevant sections from the documentation are retrieved using nearest neighbour search.
- The original query and the retrieved sections are passed to a language model (LM), along with a custom prompt to generate a response.
Solution: We will illustate how to use the “Uptrain Validation framework” to validate the performance of the chatbot. We will use a dataset built from logs generated by a chatbot made to answer questions from the Streamlit user documentation.
Validation Logic: We will check if the LLM response is empty or not for the given query. If empty, we want to return a default message instead of the LLM response.
Install UpTrain with all dependencies
pip install uptrain
uptrain-add --feature full
Make sure to define openai_api_key
import os
import openai
import polars as pl
import json
This notebook uses the OpenAI API to generate text for prompts, make sure the env variable is populated with the API key.
os.environ["OPENAI_API_KEY"] = "..."
Let’s first define our prompt and model
We have designed a prompt template to take in a question and a document and extract the relevant sections from it.
prompt_template = """
You are a developer assistant that can only quote text from documents.
You will be given a section of technical documentation titled {document_title}.
The input is: '{question}?'.
Your task is to answer the question by quoting exactly all sections of the document that are relevant to any topics of the input.
Copy the text exactly as found in the original document.
Okay, here is the document:
--- START: Document ---
{document_text}
-- END: Document ---
Now do the task. If there are no relevant sections, just respond with \"<EMPTY MESSAGE>\".
Here is the answer:
"""
Let’s now load our dataset and see how that looks
url = "https://oodles-dev-training-data.s3.us-west-1.amazonaws.com/qna-streamlit-docs.jsonl"
dataset_path = os.path.join("datasets", "qna-notebook-data.jsonl")
if not os.path.exists(dataset_path):
import httpx
r = httpx.get(url)
with open(dataset_path, "wb") as f:
f.write(r.content)
dataset = pl.read_ndjson(dataset_path).select(
pl.col(["question", "document_title", "document_text"])
)
print("Number of test cases: ", len(dataset))
print("Couple of samples: ", dataset[0:2])
Let’s now get responses from our LLM by defining our completion function. We are using GPT-3.5-Turbo for the same.
def get_model_response(input_dict):
prompt = [{"role": "system", "content": prompt_template.format(**input_dict)}]
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", messages=prompt, temperature=0.1
)
message = response.choices[0]["message"]["content"]
return message
Now that we have completed the setup, let’s try out a few examples to see how they look.
print(
json.dumps(
{
"input_question": dataset["question"][0],
"llm_response": get_model_response(dataset.to_dicts()[0]),
},
indent=1,
),
"\n",
)
print(
json.dumps(
{
"input_question": dataset["question"][1],
"llm_response": get_model_response(dataset.to_dicts()[1]),
},
indent=1,
),
"\n",
)
print(
json.dumps(
{
"input_question": dataset["question"][5],
"llm_response": get_model_response(dataset.to_dicts()[5]),
},
indent=1,
),
"\n",
)
As we can see, our model gives us empty responses for certain cases. Let’s see how we can use the UpTrain Validation Framework to check for the same and retry the LLM whenever that happens.
Using Validation Framework to check for empty responses
Defining the Validation Checks
Let’s define a Check
to evaluate if the model response is empty or not. We utilize the pre-built TextComparison
operator for the same. After running this on our input data a new variable called ‘is_empty_response’ is created.
from uptrain.framework import Check
from uptrain.operators import TextComparison
check = Check(
name="empty_response_validation",
operators=[
TextComparison(
reference_texts="<EMPTY MESSAGE>",
col_in_text="response",
col_out="is_empty_response",
),
],
)
Defining the passing condition
Our pass condition is defined as “any response that is not empty”. UpTrain provides a wrapper function called Signal which allows us to define the pass condition by utilizing mathematical operators (like ~, &, |, +, etc.).
from uptrain.framework import Signal
pass_condition = ~Signal("is_empty_response")
Defining the retry logic
Let’s define the retry logic which dictates how to generate LLM responses in case of validation failures. This could be any python function like modifying prompt, temperature, triggering a tool, returning a default response, etc.
def model_response_when_empty(input_dict):
return f"We couldn't find a good enough answer for the given question: {input_dict['question']}. Please try asking a different question"
# Call 'model_response_when_empty' when response is empty
retry_logic=[{
"name": "default_output_when_response_is_empty",
"signal": Signal("is_empty_response"),
"completion_function": model_response_when_empty
}
]
Tying everything together
UpTrain provides a ValidationManager
class that allows us to pass the Check
, completion_function and pass_condition. Instead of calling the completion_function, we can call validation_manager. Under the hood, it computes the check, makes sure the pass condition is validated, and if the pass condition is not validated, it will retry until it outputs the correct LLM response.
from validation_wrapper import ValidationManager
validation_manager = ValidationManager(
check=check,
completion_function=get_model_response,
retry_logic=retry_logic,
pass_condition=pass_condition,
)
validation_manager.setup()
Let’s run our example
Finally, let’s run it a few values from our input dataset.
for inputs in dataset.to_dicts()[:20]:
validated_response = validation_manager.run(inputs)
print("\n")