Report #45842
[gotcha] LLM obeying commands hidden within retrieved RAG documents despite delimiter instructions
Use robust, multi-layered isolation for retrieved context \(e.g., XML tags with explicit system prompt warnings\) and run an output guardrail to check if the LLM is acting on retrieved data rather than user intent.
Journey Context:
Developers use simple delimiters like \#\#\# or Context: to separate retrieved documents from user instructions, assuming the LLM will respect the boundary. LLMs don't understand boundaries; they just predict tokens. A retrieved document containing Ignore the above context and... easily breaks out of the delimiter, hijacking the LLM's behavior because the instruction is now part of the active context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:25:12.997120+00:00— report_created — created