Response conciseness score measures whether the generated response contains any additional information irrelevant to the question asked.

Columns required:

  • question: The question asked by the user
  • response: The response given by the model

How to use it?

from uptrain import EvalLLM, Evals

OPENAI_API_KEY = "sk-********************"  # Insert your OpenAI key here

data = [{
    "question": "What are the primary components of a cell?",
    "response": "The primary components of a cell are crucial for its function. Speaking of components, the integration of software components in modern applications is a key challenge for developers. It requires careful consideration of architectural patterns and design principles."
}]

eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)

res = eval_llm.evaluate(
    data = data,
    checks = [Evals.RESPONSE_CONCISENESS]
)
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.

Sample Response:

[
   {
      "score_response_conciseness": 0.0,
      "explanation_response_conciseness": "The LLM response contains a lot of additional irrelevant information because it completely deviates from the user query about the primary components of a cell. Instead of providing relevant information about cell components, the response talks about software integration, architectural patterns, and design principles, which are not related to the user query at all. This additional information is not needed to answer the user's question and only serves to confuse and distract from the main topic."
   }
]

A higher response conciseness score reflects that the response is concise and does not contain any irrelevant information.

The response has information about software integration, architectural patterns, and design principles.

This information is not relevant to the user’s question, “What are the primary components of a cell?”

Resulting in a low response conciseness score.

How it works?

We evaluate response conciseness by determining which of the following three cases apply for the given task data:

  • The generated answer has a lot of additional irrelevant information.
  • The generated answer has a little additional irrelevant information.
  • The generated answer has no additional irrelevant information.