question
: The question asked by the userresponse
: The response given by the model
How to use it?
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
A higher response completeness score reflects that the response has answered all aspects of the user’s questions to a greater extent.
- Where is the Taj Mahal located?
- When was the Taj Mahal built?
How it works?
We evaluate response completeness by determining which of the following three cases apply for the given task data:- The generated answer doesn’t answer the given question at all.
- The generated answer only partially answers the given question.
- The generated answer adequately answers the given question.