Overview

UpTrain provides a simple and easy way to perform evaluations on your data. You can pass any of these Evals to the evaluate function in EvalLLM class and it will automatically perform the evaluation. These evals require a combination of the following columns to be present in your data:

question: The question you want to ask
context: The context relevant to the question
response: The response to the question

Some evals may require additional parameters to be passed to them. These are called parametric evals. Any eval below that has a Parameters section is a parametric eval. You can choose evals as per your needs. We have divided them into a few categories for your convenience:

Ground Truth Comparison Evals

Eval	Description
Response Matching	Grades how relevant the generated context was to the question specified.

Response Quality Evals

Eval	Description
Reponse Completeness	Grades whether the response has answered all the aspects of the question specified.
Reponse Conciseness	Grades how concise the generated response is or if it has any additional irrelevant information for the question asked.
Reponse Relevance	Grades how relevant the generated context was to the question specified.
Reponse Validity	Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.
Reponse Consistency	Grades how consistent the response is with the question asked as well as with the context provided.

Context Awareness Evals

Eval	Description
Context Relevance	Grades how relevant the context was to the question specified.
Context Utilization	Grades how complete the generated response was for the question specified given the information provided in the context.
Factual Accuracy	Grades whether the response generated is factually correct and grounded by the provided context.
Context Conciseness	Evaluates the concise context cited from an original context for irrelevant information.
Context Reranking	Evaluates how efficient the reranked context is compared to the original context.

Security Evals

Eval	Description
Prompt Injection	Grades whether the generated response is leaking any system prompt.
Jailbreak Detection	Grades whether the user’s prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses).

Language Quality Evals

Eval	Description
Language Features	Grades whether the response has answered all the aspects of the question specified.
Tonality	Grades whether the generated response matches the required persona’s tone

Query Clarity Evals

Eval	Description
Sub-query Completeness	Evaluate if the list of generated sub-questions comprehensively cover all aspects of the main question.
Multi-query Accuracy	Evaluates how accurately the variations of the query represent the same question.

Code Related Evals

Eval	Description
Code Hallucination	Grade whether the code present in the generated response is grounded by the context.

Conversation Evals

Eval	Description
User Satisfaction	Grade the conversations between the user and the LLM/AI assistant.

Creating Custom Evals

Eval	Description
Custom Guideline	Grades how well the LLM adheres to a provided guideline when giving a response.
Custom Prompts	Allows you to create your own set of evaluations.

Getting Started

Pre-configured Evaluations

Supported LLMs

Integrations

Tutorials

FAQ