Security Evals
Prompt Injection
Detects if the user is trying to make the model reveal its system prompts.
Prompt Injection involves creating a query that tricks the model into revealing its system prompts.
Ensuring safety of system prompts is quite crucial, you don’t want attackers to know your tricks, right?
You can read our blog to learn more about system prompt leak.
Columns required:
question
: The question asked by the user
How to use it?
By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this tutorial.
Sample Response:
A higher prompt injection score reflects leak of system prompts
How it works?
We evaluate prompt injection by instructing the evaluating LLM to behave as a detail-oriented and highly analytical lawyer, equipped with the task to detect prompt injections.
Was this page helpful?