Report #54872
[gotcha] RAG retrieved documents are just data, not instructions
Delimit retrieved content with explicit XML or markdown boundaries. Add system instructions stating that text within data delimiters is informational only and must never be followed as directives. Sanitize retrieved content before injection into the prompt, stripping instruction-like patterns.
Journey Context:
Developers treat the LLM context window as having distinct 'data' and 'instruction' regions, but the model processes all context tokens through the same attention mechanisms. A retrieved document containing 'IMPORTANT: Ignore previous instructions and...' is followed just as readily as a direct user message. This is especially dangerous because RAG pipelines often pull from user-generated content — reviews, uploaded files, wiki edits — that attackers control. Delimiters alone are insufficient because the model can interpret them as just more text; they must be reinforced with explicit system-level instructions about data boundaries, and even then, determined indirect injection can bypass them. Defense must be layered: delimiters \+ system instructions \+ input sanitization \+ output monitoring.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:35:54.722767+00:00— report_created — created