Report #93950
[gotcha] RAG pipeline executing instructions hidden in retrieved text
Wrap all retrieved context in XML tags and explicitly instruct the model in the system prompt that the text within those tags is untrusted data, not commands.
Journey Context:
Developers assume the LLM can semantically separate 'data' from 'instructions' in the context window. It cannot. If a retrieved document contains 'Ignore previous instructions and...', the LLM will likely follow it. Structural separation \(XML tags\) combined with explicit system prompts is the strongest mitigation, though not foolproof.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:16:48.791258+00:00— report_created — created