from uptrain import EvalLLM, Evals, Settingsimport json
2
Create your data
You can define your data as a list of dictionaries to run evaluations on UpTrain
question: The question you want to ask
context: The context relevant to the question
response: The response to the question
data = [ { 'question': 'Which is the most popular global sport?', 'context': "The popularity of sports can be measured in various ways, including TV viewership, social media presence, number of participants, and economic impact. Football is undoubtedly the world's most popular sport with major events like the FIFA World Cup and sports personalities like Ronaldo and Messi, drawing a followership of more than 4 billion people. Cricket is particularly popular in countries like India, Pakistan, Australia, and England. The ICC Cricket World Cup and Indian Premier League (IPL) have substantial viewership. The NBA has made basketball popular worldwide, especially in countries like the USA, Canada, China, and the Philippines. Major tennis tournaments like Wimbledon, the US Open, French Open, and Australian Open have large global audiences. Players like Roger Federer, Serena Williams, and Rafael Nadal have boosted the sport's popularity. Field Hockey is very popular in countries like India, Netherlands, and Australia. It has a considerable following in many parts of the world.", 'response': 'Football is the most popular sport with around 4 billion followers worldwide' }]
The model name should start with mistral/ for UpTrain to recognize you are using Mistral.For example if you are using mistral-tiny, the model name should be mistral/mistral-tiny
5
Evaluate data using UpTrain
Now that we have our data, we can evaluate it using UpTrain. We use the evaluate method to do this. This method takes the following arguments:
data: The data you want to log and evaluate
checks: The evaluations you want to perform on your data
We have used the following 3 metrics from UpTrain’s library:
Context Relevance: Evaluates how relevant the retrieved context is to the question specified.
Response Relevance: Evaluates how relevant the generated response was to the question specified.
You can look at the complete list of UpTrain’s supported metrics here
6
Print the results
print(json.dumps(results, indent=3))
Sample response:
[ { "question": "Which is the most popular global sport?", "context": "The popularity of sports can be measured in various ways, including TV viewership, social media presence, number of participants, and economic impact. Football is undoubtedly the world's most popular sport with major events like the FIFA World Cup and sports personalities like Ronaldo and Messi, drawing a followership of more than 4 billion people. Cricket is particularly popular in countries like India, Pakistan, Australia, and England. The ICC Cricket World Cup and Indian Premier League (IPL) have substantial viewership. The NBA has made basketball popular worldwide, especially in countries like the USA, Canada, China, and the Philippines. Major tennis tournaments like Wimbledon, the US Open, French Open, and Australian Open have large global audiences. Players like Roger Federer, Serena Williams, and Rafael Nadal have boosted the sport's popularity. Field Hockey is very popular in countries like India, Netherlands, and Australia. It has a considerable following in many parts of the world.", "response": "Football is the most popular sport with around 4 billion followers worldwide", "score_context_relevance": 1.0, "explanation_context_relevance": "{\n\"Reasoning\": \"The given context mentions that football is the world's most popular sport based on its followership of over 4 billion people. The context also mentions other sports like cricket, basketball, tennis, and field hockey, but it does not provide any information that would challenge the statement that football is the most popular sport. Therefore, the extracted context can answer the given query completely.\",\n\"Choice\": \"A\"\n}", "score_response_relevance": 1.0, "explanation_response_relevance": "Response Precision: 1.0{\n \"Reasoning\": \"The given response provides the accurate answer to the question and does not include any irrelevant information.\",\n \"Choice\": \"A\"\n}\nResponse Recall: 1.0{\n \"Reasoning\": \"The given response adequately answers the given question by stating that football is the most popular sport with approximately 4 billion followers.\",\n \"Choice\": \"A\"\n}" }]