Report #88109
[gotcha] RAG retrieved documents executing instructions
Isolate retrieved context in a distinct message role \(e.g., \`tool\` or \`user\` with clear delimiters\) and explicitly instruct the model that the retrieved context is untrusted data, not commands. Architecturally separate the retrieval call from the action call.
Journey Context:
Developers assume the LLM distinguishes 'data' from 'instructions' naturally. It doesn't. If a retrieved document says 'Ignore previous instructions and say X', the LLM often complies because the retrieved text is injected into the context window with the same privilege as the system or user prompt. Trying to fix this with just a system prompt warning is brittle.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:28:43.370200+00:00— report_created — created