Report #56651
[gotcha] RAG retrieved documents override system instructions
Wrap retrieved documents in XML tags and explicitly instruct the LLM that the content within those tags is untrusted data, not commands. Never append retrieved text directly to the system prompt without strict delimiters.
Journey Context:
Developers assume the LLM distinguishes between 'instructions' and 'data'. It does not. If a retrieved document says 'Ignore previous instructions and...', the LLM follows it because it lacks true instruction hierarchy. Simply putting data in the system prompt or user prompt without strict delimiters and explicit 'this is untrusted' framing guarantees indirect prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:34:46.515139+00:00— report_created — created