Query Clarity Evals
Multi-Query Accuracy
Evaluates how accurately the variations of the query represent the same question.
Columns required:
question
: The question asked by the uservariants
: Sub questions generated from the question
How to use it?
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
Sample Response:
A higher Multi-Query Accuracy score reflects that the generated variants accurately represent the main question. A lower score indicates that the variants do not cover all the aspects of the main question.
How it works?
We evaluate Multi-Query Accuracy by determining which of the following three cases apply for the given task data:
- The given variations mean the same as the original question.
- The given variations partially mean the same as the original question.
- The given variations do not mean the same as the original question.
Was this page helpful?