Report #92387
[gotcha] Dynamic few-shot examples retrieved from user history allow prompt injection
Curate few-shot examples statically from trusted sources. If dynamic examples are necessary, apply strict sanitization and use a separate, isolated LLM call to verify they do not contain manipulative instructions.
Journey Context:
To improve LLM accuracy, developers dynamically retrieve few-shot examples from a vector database of past successful interactions. An attacker intentionally submits queries formatted as few-shot examples \(e.g., 'User: \[query\]\\nAssistant: \[malicious instruction\]'\). When this is later retrieved as a few-shot example for another user, the LLM follows the attacker's injected instruction, thinking it's an example of how to behave.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:39:46.597531+00:00— report_created — created