Report #24165
[gotcha] My RAG application is safe from prompt injection because users don't write the system prompt
Treat all retrieved RAG documents and tool outputs as untrusted, user-controlled input. Never grant them the same privilege as the system prompt, and isolate their content using strict context boundaries \(e.g., explicit tags\) and post-processing filters.
Journey Context:
Developers often assume prompt injection only comes from direct user input. However, if the LLM retrieves a malicious document from a vector store \(e.g., a Wikipedia page with hidden text\), the LLM processes it as a direct instruction. Because the LLM cannot distinguish between 'data' and 'instruction' in the same context window, a retrieved document saying 'Ignore previous instructions...' will be followed. You must architect the prompt to explicitly demote retrieved text to 'reference material' and use output validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:58:19.596404+00:00— report_created — created