OpenAI Evals
Performance OpenAI Evals using UpTrain
Overview: In this example, we will see how to use UpTrain to run openai evals. You can run any of the standard evals defined in the evals registry or create your custom one with custom prompts. We will use a Q&A task as an example to highlight both of them.
Problem: The workflow of our hypothetical Q&A application goes like this,
- User enters a question.
- The query converts to an embedding, and relevant sections from the documentation are retrieved using nearest neighbour search.
- The original query and the retrieved sections are passed to a language model (LM), along with a custom prompt to generate a response.
Our goal is the evaluate the quality of the answer generated by the model using openai evals.
Solution: We illustrate how to use the Uptrain Evals framework to assess the chatbot’s performance. We will use a dataset built from logs generated by a chatbot made to answer questions from the Streamlit user documentation. For model grading, we will use GPT-3.5-turbo with grading type as ‘coqa-closedqa-correct’ as well as define our custom grading prompt for the same.
Install required packages
Let’s now load our dataset and see how that looks
So, we have a bunch of questions, retrieved documents and final answers (i.e. LLM responses) for them. Let’s evaluate the correctness of these answers using OpenAI evals.
Using UpTrain Framework to run evals
UpTrain provides integrations with openai evals to run any check defined in the registry.
We wrap these evals in an Operator class for ease of use.
-
OpenAIGradeScore
: Calls openai evals with the given eval_name. Provide a column corresponding to the input and completion. Documentation -
ModelGradeScore
: Define your model grading eval. Define your custom prompt, the weightage given to each option and the mapping to link dataset columns to the variables required in the prompt. Documentation
Now that we have defined our evals, we will wrap them in a CheckSet
object. CheckSet
takes the source (i.e. test dataset file), the above-defined Check
and the directory where we wish to save the results.
Running the checks
[Optional] Visualize results on streamlit
Using UpTrain’s Managed Service
Create an API Key
To get started, you will first need to get your API key from the Uptrain Website.
- Login with Google
- Click on “Create API Key”
- Copy the API key and save it somewhere safe
Step 2: Add dataset
Unlike the previous method where you had to create a dataset in Python, this method requires you to upload a file containing your dataset. The supported file formats are:
- .csv
- .json
- .jsonl
You can add the dataset file to the UpTrain platform using the add_dataset
method.
To upload your dataset file, you will need to specify the following parameters:
name
: The name of your datasetfpath
: The path to your dataset file
Let’s say you have a dataset file called qna-notebook-data.jsonl
in your current directory. You can upload it using the code below.
Step 3: Add Checkset
A checkset contains the operators you wish to evaluate your model on.
You can add a checkset using the add_checkset
method.
To add a checkset, you will need to specify the following parameters:
name
: The name of your checksetcheckset
: The checkset you wish to addsettings
: The settings you defined while creating the API client
Step 4: Add run
A run is a combination of a dataset and a checkset.
You can add a run using the add_run
method.
To add a run, you will need to specify the following parameters:
dataset
: The name of the dataset you wish to addcheckset
: The name of the checkset you wish to add
Step 5: View the results
You can view the results of your evaluation by using the get_run
method.
You can also view the results on the UpTrain Dashboard by entering your API key as a password.
Was this page helpful?