from uptrain import EvalLLM, ResponseMatchingOPENAI_API_KEY = "sk-********************" # Insert your OpenAI key heredata = [{ "question": "Who were the two finalists of the 2023 ICC Cricket World Cup?", "ground_truth": "The finalists of the 2023 ICC Cricket World Cup were India and Australia.", "response": "Australia was a finalist in the 2023 ICC Cricket World Cup."}]eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)res = eval_llm.evaluate( data = data, checks = [ResponseMatching(method = 'llm')] # method: llm/exact/rouge)
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
A higher response matching score reflects that the generated response matches the ground truth.
The response generated contains information on Australia being a finalist and does not match the ground truth about both the finalists of the 2023 ICC Cricket World Cup.Hence, resulting in a low response matching score.