Report #71221
[gotcha] Dynamic few-shot examples in system prompts poisoned by user history to teach bad behavior
Keep few-shot examples static and trusted. If dynamic examples are necessary, clearly delimit them as untrusted data and avoid using raw user inputs as exemplars in the system prompt.
Journey Context:
Developers use recent user interactions as few-shot examples to personalize the model. Because LLMs heavily weight few-shot examples, an attacker can craft a specific input/output pair that, when injected into the system prompt as an example, teaches the model to output malicious content or override instructions for all future turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:07:33.701851+00:00— report_created — created