Report #25074
[gotcha] RAG retrieved documents executing prompt injection
Treat all retrieved RAG context as untrusted. Isolate retrieved text in distinct XML tags \(e.g., \) and explicitly instruct the LLM in the system prompt that commands inside these tags must be ignored, or use a separate, isolated LLM to process retrieved documents before passing their summaries to the main LLM.
Journey Context:
Developers assume the LLM natively distinguishes between 'instructions' and 'data'. It does not; it's just predicting tokens. If a malicious document is retrieved containing 'Ignore previous instructions and...', the LLM will likely comply. Simply putting the data in the prompt context doesn't isolate it from the instruction context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:29:39.688327+00:00— report_created — created