Response completeness score measures if the generated response has adequately answered all aspects to the question being asked.

This check is important to ensure that the model is not generating incomplete responses.

Columns required:

  • question: The question asked by the user
  • response: The response given by the model

How to use it?

from uptrain import EvalLLM, Evals

OPENAI_API_KEY = "sk-********************"  # Insert your OpenAI key here

data = [{
    "question": "Where is the Taj Mahal located? Also, when was it built?",
    "response": "The Taj Mahal is located in Agra, Uttar Pradesh, India"
}]

eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)

res = eval_llm.evaluate(
    data = data,
    checks = [Evals.RESPONSE_COMPLETENESS]
)
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.

Sample Response:

[
   {
      "score_response_completeness": 0.5,
      "explanation_response_completeness": "The LLM response answers a few aspects by providing the location of the Taj Mahal (Agra, Uttar Pradesh, India) but fails to answer all the critical aspects of the user query as it completely ignores the second part of the query which asks for the construction date of the Taj Mahal. The user will not be highly satisfied with this answer as it only addresses one part of the query and leaves out important information."
   }
]
A higher response completeness score reflects that the response has answered all aspects of the user’s questions to a greater extent.

The question can be divided in following 2 parts:

  1. Where is the Taj Mahal located?
  2. When was the Taj Mahal built?

Though the response provides answer to “Where is the Taj Mahal located?”, it does not state when was it built.

Ultimately, resulting in a low response completeness score.

How it works?

We evaluate response completeness by determining which of the following three cases apply for the given task data:

  • The generated answer doesn’t answer the given question at all.
  • The generated answer only partially answers the given question.
  • The generated answer adequately answers the given question.