question
: The question asked by the usercontext
: Information retrieved to answer the questionresponse
: The response given by the model
How to use it?
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
A higher response consistency score reflects that the generated response aligns with both the question asked and the context provided.
How it works?
We evaluate response consistency through the following steps:- Generating an argument as to why the given response is appropriate for the question asked.
- Rating the generated argument on a score of 0 to 1, as per how logical the argument seems to be.