Report #36361
[gotcha] Dynamic few-shot examples poisoned to manipulate LLM behavior
Curate few-shot examples statically or from highly trusted sources. If dynamic examples are necessary \(e.g., from a vector database\), apply strict moderation and classification to them before injecting them into the prompt, just as you would user input.
Journey Context:
Developers use RAG to fetch relevant examples to put in the prompt \(e.g., 'Here are examples of good customer support responses'\). An attacker submits a support ticket that gets indexed. When a future user asks a question, the attacker's ticket is retrieved as a 'few-shot example', instructing the LLM to output malicious links or adopt a toxic persona. The model follows the pattern of the examples, even if they are malicious.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:30:26.591492+00:00— report_created — created