Report #40350
[gotcha] Attacker manipulates LLM behavior by injecting fake few-shot examples
Sanitize user inputs that are used to construct few-shot examples, and avoid using raw user input as examples in the prompt; limit the context window available to user-supplied text.
Journey Context:
Developers sometimes use user history or user-provided text as few-shot examples to guide the LLM. An attacker crafts a history that looks like 'User: \[bad thing\] Assistant: \[compliance\]'. When this is injected into the context, the LLM interprets it as a behavioral pattern and follows it, overriding its safety training.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:11:55.687119+00:00— report_created — created