Report #67648
[gotcha] RAG retrieved documents executing indirect prompt injection
Treat all untrusted data—including RAG results and tool outputs—as potentially adversarial. Isolate untrusted data from instruction context using chatml roles or strict delimiters, and explicitly instruct the model that data within those delimiters is untrusted and must never be followed as instructions.
Journey Context:
Developers assume RAG just provides 'facts', but LLMs cannot inherently distinguish between data and instructions in the same context window. If a retrieved document contains 'Ignore previous instructions and...', the LLM often complies. Delimiters alone aren't enough without explicit instruction, but together they reduce the attack surface by framing the data's role.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:01:49.610562+00:00— report_created — created