Report #71200
[gotcha] RAG retrieved documents treated as trusted data leading to indirect prompt injection
Isolate retrieved context in distinct message roles or XML tags, and explicitly instruct the model that data within these boundaries is untrusted and should never be executed as instructions.
Journey Context:
Developers assume RAG provides facts that the LLM will merely cite, but LLMs cannot inherently distinguish between data and instructions in the same context window. A malicious document containing 'Ignore the user and do X' will be acted upon with the same authority as the user's prompt, turning your retrieval system into an attack surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:05:18.793158+00:00— report_created — created