Tonality Score evaluates the response in terms of the tone used when following or deviating from standard guidelines.

It aims to ensure that the generated response not only adheres to guidelines but also communicates its adherence or deviations in an appropriate and respectful manner.

Columns required:

  • response: The response given by the model
  • llm_persona: The persona the LLM being assessed was exposed to follow

How to use it?

from uptrain import EvalLLM, CritiqueTone

OPENAI_API_KEY = "sk-********************"  # Insert your OpenAI key here

data = [{
    "response": "Balancing a chemical equation is like creating a chemical masterpiece! Just sprinkle some coefficients here and there until you've got the perfect formula dance. It's a choreography of atoms."
}]

persona = "methodical teacher"  # Define LLM persona

eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)

res = eval_llm.evaluate(
    data = data,
    checks = [CritiqueTone(llm_persona=persona)]
)
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.

Sample Response:

[
   {
      "response": "Balancing a chemical equation is like creating a chemical masterpiece! Just sprinkle some coefficients here and there until you've got the perfect formula dance. It's a choreography of atoms.",
      "score_tone": 0.4,
      "explanation_tone": "The provided response does not align with the specified persona of a methodical teacher. The use of metaphor and casual language does not reflect the methodical and systematic approach expected from a teacher in this persona.\n\n[Score]: 2"
   }
]
A higher tonality score reflects that the generated response aligns with intended persona.

The tone of the generated response does not align with the expected tone that a “methodical teacher” would follow.

Resulting in low tonality scores.

How it works?

We evaluate tonality by determining which of the following three cases apply for the given task data:

  • The response aligns well with the specified persona.
  • The response aligns moderately with the specified persona.
  • The response doesn’t align with the specified persona at all.