Conversation Evals
Query Resolution
Evaluates the ability of the LLM to resolve the user’s query.
A good AI assistant should be able to effectively address the user’s query. Query resolution evaluates the ability of the AI assistant to resolve the user’s query effectively.
How to use it?
By default, we are using GPT 3.5 Turbo. If you want to use a different model, check out this tutorial.
Sample Response:
A higher query resolution score reflects that the LLM effectively addresses the user’s query.
The nurse in the conversation was not able to address the patient’s query, which was about chest pain, indicating a potential medical emergency.
Resulting in a low query resolution score.
How it works?
We evaluate query resolution by determining which of the following cases apply for the given task data:
- The given responses effectively resolve the user’s query.
- The given responses partially resolve the user’s query.
- The given responses do not resolve the user’s query.
Was this page helpful?