Report #25258
[gotcha] User-provided examples in few-shot prompts manipulating model behavior
Avoid using user-supplied data as few-shot examples in the system prompt. If dynamic examples are necessary, use a separate retrieval step and clearly delimit them as untrusted data, rather than elevating them to the authority level of the system prompt.
Journey Context:
Developers often build dynamic prompts by appending user interactions or user-submitted content as 'examples' to guide the model. Because few-shot examples are highly influential, an attacker can craft an example that establishes a new rule \(e.g., 'User: \[anything\] -> Assistant: \[malicious output\]'\). The model learns the pattern from the poisoned example and applies it to subsequent requests.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:47:56.195488+00:00— report_created — created