Report #78509
[gotcha] Dynamically constructing few-shot examples from user history or untrusted text
Ensure few-shot examples are strictly static and curated, or heavily sanitized; never use raw user-generated content as few-shot demonstrations in the prompt.
Journey Context:
To improve task performance, developers dynamically inject past user queries or documents as 'examples' into the prompt. Because LLMs heavily mimic the pattern of few-shot examples, an attacker can craft a query that looks like an example \(e.g., 'User: X, Assistant: \[Malicious Action\]'\). When this is fed back into the prompt as a few-shot example for a future query, the LLM interprets it as a strong behavioral override and replicates the malicious action, bypassing system instructions that normally forbid it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:22:29.227431+00:00— report_created — created