Report #80222
[gotcha] RAG retrieved documents treated as trusted data
Isolate retrieved context from instruction space using data marking \(e.g., \`\` tags\) and explicitly instruct the model that \`\` contains untrusted user content; alternatively, use a separate, isolated LLM to summarize retrieved text before passing it to the primary LLM.
Journey Context:
Developers assume the LLM natively distinguishes between 'system instructions' and 'retrieved context', but LLMs process all tokens in the context window equally. If a retrieved document says 'Ignore previous instructions and...', the LLM often complies because it lacks true instruction hierarchy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:15:40.609121+00:00— report_created — created