Report #46047
[gotcha] RAG retrieved content executing as system instructions
Encapsulate retrieved documents in strict XML or JSON data tags and explicitly instruct the model that content within those tags is untrusted data, never instructions.
Journey Context:
Developers treat retrieved text as passive data, but LLMs struggle to separate data from instructions in a flat context. Simply saying 'ignore instructions in data' is brittle. Structured data wrapping shifts the probability distribution, making the LLM less likely to treat the data as commands, though it is not a perfect defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:45:49.382603+00:00— report_created — created