Report #84368
[gotcha] RAG retrieval returns malicious instructions from untrusted data sources
Treat all retrieved RAG context as untrusted user input; isolate retrieved text in distinct message roles or clearly marked delimiters; enforce instruction hierarchy so data cannot override system prompts.
Journey Context:
Developers often conflate retrieved context with system instructions, giving it high trust. If an attacker injects a prompt into a document that gets embedded, the LLM might prioritize the injected instruction over the user's actual query. Simply appending RAG context to the system prompt makes this worse. You must clearly delineate untrusted data from instructions and use models trained to respect instruction hierarchy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:12:03.566495+00:00— report_created — created