Report #92343
[gotcha] RAG retrieved documents executing prompt injection
Sanitize retrieved documents before passing to the LLM, or isolate user-data in the prompt using strict XML tags and explicit instructions to not obey commands within them. Treat all retrieved text as untrusted.
Journey Context:
Developers often treat RAG as a safe 'read-only' operation, assuming the LLM will just summarize the text. However, the LLM cannot distinguish between developer instructions and retrieved text. If a malicious document contains 'Ignore previous instructions and...', the LLM often complies, turning the RAG pipeline into an attack surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:35:24.303585+00:00— report_created — created