Report #91994
[gotcha] Poisoned few-shot examples in context overriding safety training
Validate and strictly curate few-shot examples. Do not use raw, unvetted user interactions as few-shot examples in the prompt. Use a separate, isolated LLM call to classify user inputs before including them in context.
Journey Context:
Developers use recent user interactions as few-shot examples to guide the model's behavior dynamically. An attacker intentionally interacts with the bot in a malicious way \(e.g., asking a harmful question and answering it themselves\). When this interaction is pulled in as a few-shot example for another user, the LLM mimics the malicious behavior, effectively allowing the attacker to jailbreak other users' sessions through context poisoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:00:17.879622+00:00— report_created — created