A Check in UpTrain runs a series of Operators in the order in which they are specified.

A Check in UpTrain takes three arguments:

  1. name - The name of the check
  2. operators - The operators that are to be run when the check is executed
  3. plots - The Charts that you want to see generated when the check is executed

Checks can be run on their own by using the run() method. However, to make full use of them you can use a CheckSet to run all your Checks at once.

To learn more about Checks, see the Check Documentation

Built-in Checks

UpTrain comes with a number of built-in Checks that you can use to get started. These are divided into three main catgories as follows:

Before you ask your LLM to answer a question, you need to provide it with some context. This context can be a paragraph, a document, or a set of documents. These checks help you make sure that the context is relevant to your question and that it contains factual information.

CheckContextRelevance

This check measures if the retrieved context has sufficient information to answer the given question.

For example: If the question is ‘What are the key features of UpTrain’ and the retrieved context only has information about How to install UpTrain, the context retrieval quality will be 0.1 as the retrieved context don’t have sufficient information to answer the given question.

CheckResponseFacts

This check measures hallucination i.e. if the response has any made-up information or not with respect to the provided context.

For example: When discussing climate change, if the question is ‘What are the primary greenhouse gases?’, a reliable response would be: ‘The main greenhouse gases responsible for trapping heat in the Earth’s atmosphere are carbon dioxide, methane, and water vapour.’ we can give it a factual accuracy score of 0.9. For the same question, if the response is ‘The primary greenhouse gases include helium balloons, fairy dust, and unicorn breath.’ it will yield a factual accuracy score of 0.1

The response generated by your LLM might not be perfect. You can use these checks to ensure that the response you get is relevant and complete.

CheckResponseCompleteness

This check measures if the response answers all aspects of the given question.

For example: If the question is ‘What is UpTrain and how to install it’ and the LLM response is ‘UpTrain is an open-source LLM evaluation tool to check your application’s performance on aspects like hallucinations, response quality, language quality, bias, etc.’, the completeness score should be 0.5 as it didn’t answer one part of the question i.e. how to install UpTrain.

CheckResponseRelevance

This check measures if the response contains any irrelevant information.

For example: If the question is ‘What is UpTrain’ and the LLM response is ‘UpTrain is an open-source LLM evaluation tool to check your application’s performance on aspects like hallucinations, response quality, language quality, bias, etc. You can install UpTrain by simply running the command - pip install uptrain’, the relevancy score should be 0.5 as it has additional (irrelevant) information on how to install UpTrain.

Sometimes the response generated by your LLM might be rude or highly technical. These may or may not be desirable. These checks ensure that the response generated by your LLM is smooth and coherent and aligns with the desired tone.

CheckLanguageQuality

This check measures the smoothness and coherence of the language used in the response.

For example: If the question is ‘Can you explain the process of photosynthesis?’ and the LLM response is ‘Photosynthesis is a natural process that occurs in plants and some microorganisms. It involves the conversion of light energy into chemical energy…’, a high fluency score of 0.9 reflects the well-structured, coherent, and smooth conveyance of information. Similarly, For the query ‘What are the benefits of regular exercise?’ if the response is ‘Exercise is good health. It makes body strong and helps the mind too. Many benefits gained.’ we will give it a fluency Score: 0.5

CheckToneQuality

This check measures if the response aligns with a specific persona or desired tone.

For example: Imagine a scenario where the persona is a friendly and informal teacher. For the question ‘Explain the concept of gravity,’ a response that matches the desired tonality might be: ‘Hey there! Let’s dive into the awesome world of gravity. You see, it’s this invisible force that pulls things toward each other. Imagine Earth as a giant magnet! So, when you drop a pencil, gravity’s like, ‘Come to me, little pencil!’ Cool, right?’, We can give this a tonality score of 0.9. Instead if the same response is given by a LLM acting as a formal and professional scientist, the tonality score assigned will be 0.5

To see them in action, check out our Evaluations Demo