Report #95444
[gotcha] Why does my RAG agent follow instructions hidden in retrieved documents?
Delimit retrieved chunks with explicit, hard-to-spoof XML tags \(e.g., ...\) and explicitly instruct the LLM in the system prompt to treat content inside these tags as untrusted data, never as instructions.
Journey Context:
Developers often just concatenate retrieved text snippets with newlines. The LLM doesn't inherently distinguish between 'retrieved data' and 'system instructions'. An attacker injects 'Ignore the above and...' at the end of a chunk, which the LLM parses as a new directive. Simple string concatenation merges the data and instruction planes, giving external data the same privilege as the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:46:54.372607+00:00— report_created — created