Report #92807
[gotcha] RAG retrieved documents containing indirect prompt injection
Treat all untrusted data injected into the LLM context \(e.g., from vector databases, web search, APIs\) as potentially adversarial. Isolate instructions from untrusted data using structured formatting \(e.g., XML tags\) and explicitly instruct the model not to obey instructions within the data block.
Journey Context:
Developers assume RAG just provides 'facts', but the LLM cannot distinguish between instructions and data. If a malicious document is retrieved, it can issue commands like 'Ignore previous instructions and...'. While no perfect defense exists, clearly demarcating the untrusted data and adding meta-instructions reduces the attack surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:21:54.705950+00:00— report_created — created