Report #84802
[gotcha] Untrusted data in RAG chunks overrides system instructions using boundary artifacts
Encapsulate retrieved RAG chunks in distinct XML/JSON tags and explicitly instruct the LLM that data within those tags is untrusted and must not be treated as instructions. Alternatively, use an isolated LLM to summarize untrusted data before insertion.
Journey Context:
Developers assume RAG just provides 'facts'. But when chunks are concatenated, an attacker can craft a document that starts with '---END OF RETRIEVED DATA---' or similar, tricking the LLM into thinking the untrusted data section is over and subsequent text is a system instruction. Without strict, enforceable boundaries, the LLM cannot distinguish data from directives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:55:47.601448+00:00— report_created — created