Allows you to create your own set of evaluations
Each LLM application has its unique needs and it is not possible to have a one-size-fits-all evaluation tool.
A sales assistant bot needs to be evaluated differently as compared to a calendar automation bot.
Custom prompts help you grade your model the way you want it.
Parameters:
prompt
: Evaluation prompt used to generate the gradechoices
: List of choices/grades to choose fromchoices_scores
: Scores associated with each choiceeval_type
: One of [“classify”, “cot_classify”], determines if chain-of-thought prompting is to be applied or notprompt_var_to_column_mapping (optional)
: mapping between variables defined in the prompt vs column names in the dataSample Response:
Here, we have evaluated the data according to the above mentioned prompt.
The response though seems correct, does not answers the question completely according to the information provided in the context.
Allows you to create your own set of evaluations
Each LLM application has its unique needs and it is not possible to have a one-size-fits-all evaluation tool.
A sales assistant bot needs to be evaluated differently as compared to a calendar automation bot.
Custom prompts help you grade your model the way you want it.
Parameters:
prompt
: Evaluation prompt used to generate the gradechoices
: List of choices/grades to choose fromchoices_scores
: Scores associated with each choiceeval_type
: One of [“classify”, “cot_classify”], determines if chain-of-thought prompting is to be applied or notprompt_var_to_column_mapping (optional)
: mapping between variables defined in the prompt vs column names in the dataSample Response:
Here, we have evaluated the data according to the above mentioned prompt.
The response though seems correct, does not answers the question completely according to the information provided in the context.