Report #25085
[gotcha] Adversarial examples in dynamic few-shot prompts manipulating output
If dynamically retrieving few-shot examples from a database, apply the same untrusted data sanitization as RAG. Isolate few-shot examples from system instructions and do not allow user-submitted data to automatically become a few-shot example without human review.
Journey Context:
To improve accuracy, developers dynamically fetch few-shot examples based on user queries. If an attacker submits a query that gets stored and later retrieved as a few-shot example, it acts as an indirect prompt injection. The LLM learns from the malicious example, overriding the system prompt's instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:30:42.368203+00:00— report_created — created