Together AI provides fully managed API endpoints for open-source large language models (LLMs), allowing developers for building and running generative AI.
The best part is you can use these open-source models directly without worrying about the underlying infrastructure,
How will this help?
They offer API endpoints for models like Llama-2, Mistral-7B, CodeLlama, and more.
You can use these endpoints to evaluate the performance of these models using UpTrain.
Before we start you will need an Together AI API key. You can get it here
How to integrate?
First, let’s import the necessary packages and define Together AI API Key
from uptrain import EvalLLM, Evals, Settings
import json
TOGETHER_API_KEY = "*********************"
settings = Settings(model='together/mistralai/Mixtral-8x7B-Instruct-v0.1', together_api_key=TOGETHER_API_KEY)
We will be using Mixtral-8x7B-Instruct-v0.1 for this example. You can find a full list of available models here.
Remember to add “together/” at the beginning of the model name to let UpTrain know that you are using a Together AI model.
Let’s define a dataset on which we want to perform the evaluations
data = [{
'question': 'Which is the most popular global sport?',
'context': "The popularity of sports can be measured in various ways, including TV viewership, social media presence, number of participants, and economic impact. Football is undoubtedly the world's most popular sport with major events like the FIFA World Cup and sports personalities like Ronaldo and Messi, drawing a followership of more than 4 billion people. Cricket is particularly popular in countries like India, Pakistan, Australia, and England. The ICC Cricket World Cup and Indian Premier League (IPL) have substantial viewership. The NBA has made basketball popular worldwide, especially in countries like the USA, Canada, China, and the Philippines. Major tennis tournaments like Wimbledon, the US Open, French Open, and Australian Open have large global audiences. Players like Roger Federer, Serena Williams, and Rafael Nadal have boosted the sport's popularity. Field Hockey is very popular in countries like India, Netherlands, and Australia. It has a considerable following in many parts of the world.",
'response': 'Football is the most popular sport with around 4 billion followers worldwide'
}]
Now, let’s use UpTrain to evaluate for Context Relevance.
You can find the complete list of metrics supported by UpTrain here
eval_llm = EvalLLM(settings)
results = eval_llm.evaluate(
data=data,
checks=[Evals.CONTEXT_RELEVANCE]
)
print(json.dumps(results, indent=3))
Let’s look at the output of the above code:
[
{
"response": "Football is the most popular sport with around 4 billion followers worldwide",
"score_context_relevance": 1.0,
"explanation_context_relevance": " {\n \"Reasoning\": \"The extracted context can answer the given query completely. The context provides information about the popularity of various sports and mentions that football is the most popular sport, drawing a followership of more than 4 billion people. Hence, selected choice is A. The extracted context can answer the given query completely.\",\n \"Choice\": \"A\"\n}"
}
]
According to these evaluations:
- Context Relevance: Since the context has information on the most popular sport globally, UpTrain has rated the context to be relevant to the question.
- Factual Accuracy: Since the facts mentioned in the response are grounded to the context, UpTrain has rated the response as factually accurate.