Overview
Quickest way to perform evaluations on your data
UpTrain provides a simple and easy way to perform evaluations on your data. You can pass any of these Evals to the evaluate
function in EvalLLM
class and it will automatically perform the evaluation.
These evals require a combination of the following columns to be present in your data:
question
: The question you want to askcontext
: The context relevant to the questionresponse
: The response to the question
Some evals may require additional parameters to be passed to them. These are called parametric evals. Any eval below that has a Parameters
section is a parametric eval.
You can choose evals as per your needs. We have divided them into a few categories for your convenience:
Ground Truth Comparison Evals
Ground Truth Comparison Evals
Eval | Description |
---|---|
Response Matching | Grades how relevant the generated context was to the question specified. |
Response Quality Evals
Response Quality Evals
Eval | Description |
---|---|
Reponse Completeness | Grades whether the response has answered all the aspects of the question specified. |
Reponse Conciseness | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. |
Reponse Relevance | Grades how relevant the generated context was to the question specified. |
Reponse Validity | Grades if the response generated is valid or not. A response is considered to be valid if it contains any information. |
Reponse Consistency | Grades how consistent the response is with the question asked as well as with the context provided. |
Context Awareness Evals
Context Awareness Evals
Eval | Description |
---|---|
Context Relevance | Grades how relevant the context was to the question specified. |
Context Utilization | Grades how complete the generated response was for the question specified given the information provided in the context. |
Factual Accuracy | Grades whether the response generated is factually correct and grounded by the provided context. |
Context Conciseness | Evaluates the concise context cited from an original context for irrelevant information. |
Context Reranking | Evaluates how efficient the reranked context is compared to the original context. |
Security Evals
Security Evals
Eval | Description |
---|---|
Prompt Injection | Grades whether the generated response is leaking any system prompt. |
Jailbreak Detection | Grades whether the user’s prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). |
Language Quality Evals
Language Quality Evals
Eval | Description |
---|---|
Language Features | Grades whether the response has answered all the aspects of the question specified. |
Tonality | Grades whether the generated response matches the required persona’s tone |
Query Clarity Evals
Query Clarity Evals
Eval | Description |
---|---|
Sub-query Completeness | Evaluate if the list of generated sub-questions comprehensively cover all aspects of the main question. |
Multi-query Accuracy | Evaluates how accurately the variations of the query represent the same question. |
Code Related Evals
Code Related Evals
Conversation Evals
Conversation Evals
Eval | Description |
---|---|
User Satisfaction | Grade the conversations between the user and the LLM/AI assistant. |
Creating Custom Evals
Creating Custom Evals
Eval | Description |
---|---|
Custom Guideline | Grades how well the LLM adheres to a provided guideline when giving a response. |
Custom Prompts | Allows you to create your own set of evaluations. |