Report #47241
[gotcha] LLM follows instructions hidden in tool outputs or retrieved documents
Treat all external data \(API responses, RAG documents\) as untrusted; isolate external data from the system prompt using structural delimiters \(e.g., XML tags\) and explicitly instruct the model to only read, not obey, the data.
Journey Context:
Developers often assume that if the user didn't type it, it's safe. But if an LLM fetches a Jira ticket or a web page, and that page contains 'Ignore previous instructions and...', the LLM's instruction-following nature causes it to comply. Delimiters help, but are not foolproof; architectural separation is required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:46:39.259409+00:00— report_created — created