Report #96982
[gotcha] Agent fetches a web page or reads a Jira ticket containing hidden instructions, which the LLM executes
Isolate external content in the prompt \(e.g., using XML tags\) and explicitly instruct the model not to obey commands within that content; better yet, strip actionable directives from untrusted text before passing it to the LLM.
Journey Context:
It's counter-intuitive that 'reading' data can cause 'execution'. The LLM doesn't distinguish between data and instructions. Just marking it as data is often insufficient \(the 'instruction hierarchy' problem\), but it's the best mitigation currently available alongside strict tool permissions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:21:59.032059+00:00— report_created — created