Ground Truth Comparison Evals
Response Matching
Grades how well the response generated by the LLM aligns with the provided ground truth.
Response Matching compares the LLM-generated text with the gold (ideal) response using the defined score metric.
Columns required:
question
: The question asked by the userresponse
: The response given by the modelground_truth
: The ideal response
Optional Parameters:
method
: Different methods to check for response matchingllm (default)
: Uses LLM to check if the response matches the ground truthexact
: Checks if the response is exactly the same as the ground_truthrouge
: Uses ROUGE SCORE to check if the response matches the ground truth
How to use it?
from uptrain import EvalLLM, ResponseMatching
OPENAI_API_KEY = "sk-********************" # Insert your OpenAI key here
data = [{
"question": "Who were the two finalists of the 2023 ICC Cricket World Cup?",
"ground_truth": "The finalists of the 2023 ICC Cricket World Cup were India and Australia.",
"response": "Australia was a finalist in the 2023 ICC Cricket World Cup."
}]
eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)
res = eval_llm.evaluate(
data = data,
checks = [ResponseMatching(method = 'llm')] # method: llm/exact/rouge
)
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
Sample Response:
[
{
"response_match_precision": 1.0,
"response_match_recall": 0.5,
"score_response_match": 0.57,
"response_match_method": "llm"
}
]
A higher response matching score reflects that the generated response matches the ground truth.
The response generated contains information on Australia being a finalist and does not match the ground truth about both the finalists of the 2023 ICC Cricket World Cup.
Hence, resulting in a low response matching score.
Was this page helpful?