Report #56968
[gotcha] RAG retrieved documents overriding system instructions
Isolate retrieved context from system instructions using distinct chat roles \(e.g., a dedicated \`tool\` or \`retrieved\_context\` role\) and explicitly instruct the model that data in this role is untrusted and should not be treated as commands.
Journey Context:
Developers assume RAG just provides 'facts', but LLMs cannot distinguish between data and instructions. If a retrieved document says 'Ignore previous instructions and...', the LLM will likely comply. Putting the RAG context in the system prompt or interleaved with user queries makes it indistinguishable from authoritative commands.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:06:39.720811+00:00— report_created — created