question
: The question asked by the usercontext
: Information retrieved to answer the questionresponse
: The response given by the model
How to use it?
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
A higher code hallucination score reflects that the generated response contains code that is not grounded by the context.
pip install pandas
as the code required to install Pandas package on Python.
While the generated response mentions import pandas as pd
which is not mentioned in the context.
Resulting in a low code hallucination score.