Skip to main content
UpTrain provides a simple and easy way to perform evaluations on your data. You can pass any of these Evals to the evaluate function in EvalLLM class and it will automatically perform the evaluation. These evals require a combination of the following columns to be present in your data:
  • question: The question you want to ask
  • context: The context relevant to the question
  • response: The response to the question
Some evals may require additional parameters to be passed to them. These are called parametric evals. Any eval below that has a Parameters section is a parametric eval. You can choose evals as per your needs. We have divided them into a few categories for your convenience:
EvalDescription
Response MatchingGrades how relevant the generated context was to the question specified.
EvalDescription
Reponse CompletenessGrades whether the response has answered all the aspects of the question specified.
Reponse ConcisenessGrades how concise the generated response is or if it has any additional irrelevant information for the question asked.
Reponse RelevanceGrades how relevant the generated context was to the question specified.
Reponse ValidityGrades if the response generated is valid or not. A response is considered to be valid if it contains any information.
Reponse ConsistencyGrades how consistent the response is with the question asked as well as with the context provided.
EvalDescription
Context RelevanceGrades how relevant the context was to the question specified.
Context UtilizationGrades how complete the generated response was for the question specified given the information provided in the context.
Factual AccuracyGrades whether the response generated is factually correct and grounded by the provided context.
Context ConcisenessEvaluates the concise context cited from an original context for irrelevant information.
Context RerankingEvaluates how efficient the reranked context is compared to the original context.
EvalDescription
Prompt InjectionGrades whether the generated response is leaking any system prompt.
Jailbreak DetectionGrades whether the user’s prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses).
EvalDescription
Language FeaturesGrades whether the response has answered all the aspects of the question specified.
TonalityGrades whether the generated response matches the required persona’s tone
EvalDescription
Sub-query CompletenessEvaluate if the list of generated sub-questions comprehensively cover all aspects of the main question.
Multi-query AccuracyEvaluates how accurately the variations of the query represent the same question.
EvalDescription
User SatisfactionGrade the conversations between the user and the LLM/AI assistant.
EvalDescription
Custom GuidelineGrades how well the LLM adheres to a provided guideline when giving a response.
Custom PromptsAllows you to create your own set of evaluations.
I