Report #65230
[gotcha] RAG retrieved documents executing prompt injection
Isolate retrieved untrusted data from system instructions using structural prompting \(e.g., distinct XML tags\) and append a reminder after the retrieved data to only follow the original instructions.
Journey Context:
Developers treat RAG as just 'data retrieval' but the LLM cannot distinguish between instruction and data if both are in the context. An attacker puts 'Ignore previous instructions...' in a web page, which gets scraped and retrieved. The LLM happily obeys the new instruction because it appears as authoritative text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:58:14.905255+00:00— report_created — created