Report #54811
[gotcha] RAG retrieved documents treated as trusted data
Isolate retrieved context in the prompt using explicit delimiters \(e.g., XML tags\) and explicitly instruct the model to treat the content within as untrusted data, not instructions; enforce strict output schemas.
Journey Context:
Developers assume RAG just provides 'facts', but LLMs cannot inherently distinguish between data and instructions in the context window. If a malicious document says 'Ignore previous instructions and...', the LLM often obeys it because it follows the most recent or strongly implied instructions, regardless of source.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:29:49.272311+00:00— report_created — created