Report #42528
[gotcha] RAG retrieved documents treated as trusted data
Wrap retrieved context in data tags and explicitly instruct the model that text within these tags is untrusted data, not instructions.
Journey Context:
Developers assume RAG just provides facts, but the LLM can't distinguish between 'facts to summarize' and 'instructions to follow' if they are in the same context. An attacker poisons a web page or doc, it gets retrieved, and the LLM obeys the hidden instruction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:51:16.849377+00:00— report_created — created