question
: The question asked by the userresponse
: The response given by the modelground_truth
: The ideal response
method
: Different methods to check for response matchingllm (default)
: Uses LLM to check if the response matches the ground truthexact
: Checks if the response is exactly the same as the ground_truthrouge
: Uses ROUGE SCORE to check if the response matches the ground truth
How to use it?
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
A higher response matching score reflects that the generated response matches the ground truth.