Custom Prompts

Each LLM application has its unique needs and it is not possible to have a one-size-fits-all evaluation tool.

A sales assistant bot needs to be evaluated differently as compared to a calendar automation bot.

Custom prompts help you grade your model the way you want it.

Parameters:

prompt: Evaluation prompt used to generate the grade
choices: List of choices/grades to choose from
choices_scores: Scores associated with each choice
eval_type: One of [“classify”, “cot_classify”], determines if chain-of-thought prompting is to be applied or not
prompt_var_to_column_mapping (optional): mapping between variables defined in the prompt vs column names in the data

How to use it?

prompt = """
You are an expert medical school professor specializing in grading students' answers to questions.
You are grading the following question:
{question}
Here is the real answer:
{ground_truth}
You are grading the following predicted answer:
{response}
"""

# Create a list of choices
choices = ["Correct", "Correct but Incomplete", "Incorrect"]

# Create scores for the choices
choice_scores = [1.0, 0.5, 0.0]

data = [{
      "user_question": "What causes diabetes?",
      "ground_truth_response": "Diabetes is a metabolic disorder characterized by high blood sugar levels. It is primarily caused by a combination of genetic and environmental factors, including obesity and lack of physical activity.",
      "user_response": "Diabetes is primarily caused by a combination of genetic and environmental factors, including obesity and lack of physical activity."
}]

prompt_var_to_column_mapping = {
    "question": "user_question",
    "ground_truth": "ground_truth_response",
    "response": "user_response"
}

from uptrain import CustomPromptEval, EvalLLM, Settings
import json

OPENAI_API_KEY = "sk-*****************"  # Insert your OpenAI key here
eval_llm = EvalLLM(settings=Settings(openai_api_key=OPENAI_API_KEY, response_format={"type":"json_object"}))

results = eval_llm.evaluate(
    data = data,
    checks = [CustomPromptEval(
        prompt = prompt,
        choices = choices,
        choice_scores = choice_scores,
        prompt_var_to_column_mapping = prompt_var_to_column_mapping
    )]
)    

By Default we are using GPT 3.5 Turbo. If you want to use some other model check out this tutorial

Sample Response:

[
   {
      "Choice": "CORRECT BUT INCOMPLETE",
      "Explanation": "The predicted answer correctly identifies the primary causes of diabetes as genetic and environmental factors, including obesity and lack of physical activity. However, it does not mention that diabetes is a metabolic disorder characterized by high blood sugar levels, which is an important aspect of the real answer.",
      "score_custom_prompt": 0.5
   }
]

Here, we have evaluated the data according to the above mentioned prompt.

The response though seems correct, does not answers the question completely according to the information provided in the context.

Tutorial

Open this tutorial in GitHub

Have Questions?

Join our community for any questions or requests

Getting Started

Pre-configured Evaluations

Supported LLMs

Integrations

Tutorials

FAQ

How to use it?

Tutorial

Have Questions?

Getting Started

Pre-configured Evaluations

Supported LLMs

Integrations

Tutorials

FAQ

​How to use it?

Tutorial

Have Questions?

How to use it?