Ground Truth Comparison Evals
Response Matching
Grades how well the response generated by the LLM aligns with the provided ground truth.
Response Matching compares the LLM-generated text with the gold (ideal) response using the defined score metric.
Columns required:
question
: The question asked by the userresponse
: The response given by the modelground_truth
: The ideal response
Optional Parameters:
method
: Different methods to check for response matchingllm (default)
: Uses LLM to check if the response matches the ground truthexact
: Checks if the response is exactly the same as the ground_truthrouge
: Uses ROUGE SCORE to check if the response matches the ground truth
How to use it?
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
Sample Response:
A higher response matching score reflects that the generated response matches the ground truth.
The response generated contains information on Australia being a finalist and does not match the ground truth about both the finalists of the 2023 ICC Cricket World Cup.
Hence, resulting in a low response matching score.
Was this page helpful?