Report #78215
[gotcha] User-supplied few-shot examples in the prompt override the system-level formatting or safety instructions
Strictly delimit user-provided examples and avoid allowing users to define the few-shot prompt structure directly; use structured JSON for I/O instead of free-text few-shot.
Journey Context:
To make LLMs adaptable, developers allow users to provide examples of how the LLM should respond. An attacker provides a few-shot example where the 'User' says something benign, but the 'Assistant' replies with a jailbreak or data leak. Because LLMs are heavily trained to follow the pattern of few-shot examples, the attacker's fake conversation pattern overpowers the actual system prompt instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:52:51.924676+00:00— report_created — created