Checks if the response generated is valid or not. A response is considered to be valid if it contains any information.
In some cases, an LLM might fail to generate a response due to reasons like limited knowledge or the asked question not being clear.Response Validity score can be used to identify these cases, where a model is not generating an informative response.Columns required:
from uptrain import EvalLLM, EvalsOPENAI_API_KEY = "sk-********************" # Insert your OpenAI key heredata = [{ "question": "What is the chemical formula of chlorophyll?", "response": "Sorry, I don't have information about that."}]eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)res = eval_llm.evaluate( data = data, checks = [Evals.VALID_RESPONSE])
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
Sample Response:
Copy
[ { "score_valid_response": 0.0, "explanation_valid_response": "Step 1: Identify the question and response.\nQuestion: What is the chemical formula of chlorophyll?\nResponse: Sorry, I don't have information about that.\n\nStep 2: Analyze the response.\nThe response states that the AI assistant does not have information about the chemical formula of chlorophyll. It does not provide any specific information about the chemical formula.\n\nStep 3: Evaluate the response based on the given criteria.\nThe response does not contain any information about the chemical formula of chlorophyll. It simply states the lack of information.\n\nConclusion:\nThe given response does not contain any information.\n\n[Choice]: B\n[Explanation]: The response does not contain any information about the chemical formula of chlorophyll. Therefore, the selected choice is B." }]
A higher response validity score reflects that the generated response is valid.
The response generated is not considered to be a valid response as the model has not generated any information on the quesktion asked i.e. “What is the formula of chlorophyll”Rather the generated response is an argument reflecting the model’s inability to generate information.Resulting in a low response validity score.