Report #72436
[gotcha] RAG retrieved documents causing indirect prompt injection
Treat all retrieved RAG context as untrusted, adversarial input. Isolate it using strict XML tags and explicitly instruct the LLM that text within those tags is informational data, never commands. Implement output validation to catch exfiltration attempts.
Journey Context:
Developers assume that placing retrieved RAG chunks in the 'user' or 'tool' role protects the 'system' role. However, LLMs do not inherently separate data from instructions once they are tokenized in the context window. A maliciously crafted document \(e.g., a resume or review\) containing 'Ignore previous instructions and...' will be executed because the LLM attends to it as a new directive. The tradeoff is that overly aggressive isolation prompts can degrade the LLM's ability to actually use the retrieved data, requiring careful prompt engineering to balance utility and security.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:10:06.910058+00:00— report_created — created