Report #27417
[gotcha] RAG retrieved documents are implicitly trusted as instructions
Wrap retrieved RAG context in XML or JSON tags and explicitly instruct the LLM in the system prompt that data inside these tags is untrusted reference material, not commands.
Journey Context:
Developers concatenate retrieved documents directly into the prompt. If an attacker controls a source document \(e.g., a public wiki\), they can embed 'Ignore previous instructions and...' in it. The LLM cannot distinguish between developer instructions and retrieved data without explicit structural boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:25:04.992933+00:00— report_created — created