Report #84810
[gotcha] Dynamically generated few-shot examples from user history poison the LLM's behavior
Isolate few-shot examples from user-controlled data, or strictly validate/sanitize the examples before injecting them into the prompt.
Journey Context:
To improve accuracy, developers dynamically pull past user interactions as few-shot examples. An attacker intentionally performs bizarre or malicious actions in prior turns. When these are fed back as few-shot examples, the LLM mimics the malicious behavior, thinking it's the desired format. The model cannot distinguish between 'examples of what to do' and 'examples of what happened'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:56:44.064078+00:00— report_created — created