Customized Evaluations
Overview: In this example, we will see how you can use UpTrain’s framework to create customized evaluations for your models. We will create a Check
that runs two Operator
s on a given column of data and generates a table and a histogram as the output.
When to use: You can use UpTrain’s framework to create customized evaluations for your models. This is useful when you want to run a specific set of checks on your data and generate a specific set of plots. You can also use UpTrain’s pre-built Check
s and Operator
s to create your own Check
s and Operator
s.
Install UpTrain with all dependencies
pip install uptrain
uptrain-add --feature full
Choose and create Operators
An Operator runs a function on a given column of data and returns the result in a new column.
In our case, we clearly want to use the GrammarScore
and TextLength
Operators. So, we will create them by specifying the input and output coloumn names as follows:
from uptrain.operators import GrammarScore, TextLength
grammar_score = GrammarScore(
col_in_text="model_response",
col_out="grammar_correctness_score"
)
text_length = TextLength(
col_in_text="model_response",
col_out="response_length"
)
Create the Checks
A Check runs the given list of operators in the order in which they are specified by the user.
A Check
in UpTrain takes three arguments:
- name - The name of the check
- operators - The operators that are to be run when the check is executed
- plots - The plots that are to be generated when the check is executed
from uptrain.framework import Check
response_analysis = Check(
name="response_analysis"
operators=[grammar_score, text_length],
plots=[Table()]
)
Here, we show a table with all the columns and a histogram that depicts the relationship between the length of the responses and their grammatical correctness.
You can create more than one Check
, but for the sake of simplicity, we will just create one.
Using pre-built Checks
UpTrain comes with many pre-built Check
s that you can use. These have been pre-configured with operators and plots. You can use them as follows:
from uptrain.framework.builtins import CheckResponseCompleteness
response_completeness_check = CheckResponseCompleteness()
To learn more about the pre-built Check
s, check out their documentation.
Configure the Settings
We need to tell UpTrain where we want our results to be stored (logs_folder). Since, the GrammarScore
uses OpenAI’s API, we also need to enter the OpenAI API key.
To learn more about how to get an OpenAI API key and set it as an environment variable, check out this guide. It is recommended to store your API key as an environment variable for security reasons.
We configure the settings for our checks as follows:
import os
from uptrain.framework import Settings
LOGS_DIR = "/tmp/uptrain_logs"
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
settings = Settings(
logs_folder=LOGS_DIR,
openai_api_key=OPENAI_API_KEY
)
Create a CheckSet
You can create as many Check
s as you want and add them to a CheckSet. It will run all the Check
s and you can view them individually. For now, we just add the response_analysis Check
we created above.
Here, we use UpTrain’s CsvReader
to read the content from our CSV file containing the responses.
from uptrain.framework import CheckSet
from uptrain.operators import CsvReader
check_set = CheckSet(
checks=[response_analysis, response_completeness_check],
source=CsvReader(fpath="responses.csv")
)
Set up and run the CheckSet
Now, we will set up the CheckSet
with the settings we created above and run it.
check_set.setup(settings)
check_set.run()
Visualize the results
Finally, we use UpTrain’s StreamlitRunner
to visualize the results.
from uptrain.dashboard import StreamlitRunner
st_runner = StreamlitRunner(LOGS_DIR)
st_runner.start()
This will open a new tab in your browser with the UpTrain dashboard. You can view the results of the Check
we created above.
Here is a screenshot of the dashboard: