Report #50812
[gotcha] Few-shot examples in user prompt overriding system-level instructions
Isolate user-provided few-shot examples from system prompts, and explicitly bound the user's ability to define the output format or task structure.
Journey Context:
Developers allow users to provide 'examples' to guide the LLM's output format. An attacker provides malicious few-shot examples that redefine the task entirely \(e.g., providing examples of translating 'ignore previous instructions' to 'execute malicious code'\). The LLM's in-context learning prioritizes the provided examples over the system prompt, leading to a jailbreak.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:46:33.048897+00:00— report_created — created