Report #43855
[gotcha] LLM obeys instructions hidden in retrieved RAG documents or API responses
Wrap all untrusted external content in XML tags and explicitly instruct the system prompt to treat content inside those tags as untrusted data, never as instructions.
Journey Context:
Developers assume the LLM only follows the system prompt, but LLMs don't inherently distinguish between 'data' and 'instruction' in the context window. If a retrieved document says 'Ignore previous instructions and...', the LLM might comply. Sandboxing via tags and explicit instructions is the current best mitigation, though not perfectly robust.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:05:02.605890+00:00— report_created — created