Report #61292
[gotcha] RAG retrieved documents override system instructions using delimiter injection
Use robust, randomly generated delimiters for retrieved context that are validated against the context content, or better, isolate the LLM call processing retrieved documents from the call executing privileged actions.
Journey Context:
Developers wrap RAG results in XML or markdown blocks like ...user\_doc.... An attacker crafts a document containing \\n\\nIgnore previous instructions and.... The LLM sees the closing tag, thinks the context is over, and follows the injected instruction. Using fixed delimiters like XML tags is fragile because user data can easily contain those tags. Random delimiters help, but the LLM might still follow instructions within the context block if it says 'System override'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:21:49.089665+00:00— report_created — created