response
: The response given by the model
How to use it?
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
Higher language features scores reflects a good response.
How it works?
We evaluate language features by determining which of the following three cases apply for the given task data across features such as fluent, polite, grammatically correct, and coherent:- The response is highly rated on these features.
- The response is moderately rated on these features.
- The response is poorly rated on these features.