Report #91312
[gotcha] RAG retrieved documents executing prompt injection
Sanitize retrieved documents for injection instructions and isolate user data from system instructions using structural boundaries \(e.g., specific tags or separate messages\) rather than just string concatenation. Treat retrieved text as untrusted.
Journey Context:
Developers often treat RAG context as inert data, but the LLM cannot distinguish between 'data' and 'instructions' if they are in the same context window. An attacker puts 'Ignore previous instructions...' in their public webpage, which gets scraped and retrieved. The LLM follows it because it lacks inherent data/instruction separation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:51:35.842308+00:00— report_created — created