Report #66199
[gotcha] Indirect Prompt Injection via Retrieved RAG Documents
Wrap all untrusted external data \(RAG results, API outputs\) in XML tags and explicitly instruct the LLM in the system prompt that content within those tags is data, not instructions, and must never be obeyed as commands.
Journey Context:
Developers treat RAG results as facts, but to the LLM, they are just tokens. If a retrieved document says 'Ignore previous instructions...', the LLM often complies because it cannot natively distinguish instruction hierarchy once tokenized. Isolating untrusted data with explicit boundaries is the best current defense, though not perfectly robust, as it relies on the LLM's attention mechanism respecting the tags over the injected payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:35:37.398113+00:00— report_created — created