Perform A/B testing on your data with UpTrain
Experiments help you perform A/B testing, so you can compare and choose the options most suitable for you. This notebook shows you how to perform experiments with UpTrain.
The experiment we will be demonstrating is to compare the responses given by a model when passed contexts of different lengths. This is done by using a chunk_size
parameter that limits the number of tokens in the context
passed to the model.
We will only look at the code that is specific to performing experiments. We will not be looking at the entire process of extracting the context and generating the response. To learn more about that, please refer to the Data Driven Experimentation Demo.
Run the following commands in your terminal to install UpTrain:
Before we can start using UpTrain, we need to create an API client. You can do this by passing your API key to the APIClient
constructor.
You can define your data as a simple dictionary with the following keys:
question
: The question you want to askcontext
: The context relevant to the questionresponse
: The response to the questionHere, we will perform A/B testing based on chunk size. This value is also passed as a key in the data dictionary.
chunk_size
: The limit on the number of tokens in the contextNow that we have our data, we can perform experiments on it using UpTrain. We use the evaluate_experiments
method to do this. This method takes the following arguments:
project_name
: The name of your projectdata
: The data you want to log and evaluateevals
: The evaluations you want to perform on your dataexp_columns
- A list of all the columns that act as identifiers to indicate which experiment the row belongs to. You can enter multiple column names here.You can find the list of all available evaluations here.
We can use these results to compare the changes in the model’s response when the context length is changed. This would be more clear when done with a larger dataset. However, the process is the same.
Factual Accuracy Score:
Context Relevance Score:
Response Relevance Score:
Access UpTrain Dashboards: We can access the evaluation results at https://demo.uptrain.ai/dashboard/ - the same API key can be used to access the dashboards. Here’s a sample screenshot of the above evaluation performed on a larger dataset in the Data Driven Experimentation Demo.
Perform A/B testing on your data with UpTrain
Experiments help you perform A/B testing, so you can compare and choose the options most suitable for you. This notebook shows you how to perform experiments with UpTrain.
The experiment we will be demonstrating is to compare the responses given by a model when passed contexts of different lengths. This is done by using a chunk_size
parameter that limits the number of tokens in the context
passed to the model.
We will only look at the code that is specific to performing experiments. We will not be looking at the entire process of extracting the context and generating the response. To learn more about that, please refer to the Data Driven Experimentation Demo.
Run the following commands in your terminal to install UpTrain:
Before we can start using UpTrain, we need to create an API client. You can do this by passing your API key to the APIClient
constructor.
You can define your data as a simple dictionary with the following keys:
question
: The question you want to askcontext
: The context relevant to the questionresponse
: The response to the questionHere, we will perform A/B testing based on chunk size. This value is also passed as a key in the data dictionary.
chunk_size
: The limit on the number of tokens in the contextNow that we have our data, we can perform experiments on it using UpTrain. We use the evaluate_experiments
method to do this. This method takes the following arguments:
project_name
: The name of your projectdata
: The data you want to log and evaluateevals
: The evaluations you want to perform on your dataexp_columns
- A list of all the columns that act as identifiers to indicate which experiment the row belongs to. You can enter multiple column names here.You can find the list of all available evaluations here.
We can use these results to compare the changes in the model’s response when the context length is changed. This would be more clear when done with a larger dataset. However, the process is the same.
Factual Accuracy Score:
Context Relevance Score:
Response Relevance Score:
Access UpTrain Dashboards: We can access the evaluation results at https://demo.uptrain.ai/dashboard/ - the same API key can be used to access the dashboards. Here’s a sample screenshot of the above evaluation performed on a larger dataset in the Data Driven Experimentation Demo.