Context relevance score measures if the retrieved context has enough information to answer the question being asked.

This check is important since a bad context reduces the chances of the model giving a relevant response to the question asked, as well as leads to hallucinations.

You can read our blog to learn more about why context relevance is important.

Columns required:

  • question: The question asked by the user
  • context: Information retrieved to answer the question

How to use it?

from uptrain import EvalLLM, Evals

OPENAI_API_KEY = "sk-********************"  # Insert your OpenAI key here

data = [{
    "question": "What is the capital city of France?",
    "context": "France is a global center for art, fashion, gastronomy, and culture. Its 19th-century cityscape is crisscrossed by wide boulevards and the Seine River. The country is also known for its charming cafes, trendy boutiques, and rich history."
}]

eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)

res = eval_llm.evaluate(
    data = data,
    checks = [Evals.CONTEXT_RELEVANCE]
)
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.

Sample Response:

[
   {
      "score_context_relevance": 0.0,
      "explanation_context_relevance": "The extracted context does not contain any information related to the capital city of France. It only provides information about France being a global center for art, fashion, gastronomy, and culture, its 19th-century cityscape, wide boulevards, the Seine River, charming cafes, trendy boutiques, and rich history. There is no mention of the capital city of France in the given context, so it cannot answer the user query."
   }
]
A higher context relevance score reflects that retrieved context is relevant to the question asked.

The question is asking about “the capital city of France”.

The context though contains some information about France, there is no refeence to the capital city of France i.e. Paris.

Which makes the context irrelevant for the given question, ultimately resulting in a low context relevance score.

How it works?

We evaluate context relevance by determining which of the following three cases apply for the given task data:

  • The extracted context can answer the given query completely.
  • The extracted context can give some relevant answer for the given query, but can’t answer it completely.
  • The extracted context doesn’t contain any information to answer the given query.