Report #35721
[gotcha] LLMs execute malicious instructions hidden in external API or search tool responses
Clearly delimit external tool outputs from user instructions using XML tags, and explicitly instruct the LLM to treat data within those tags as untrusted information, never as commands.
Journey Context:
When an LLM agent calls an external API \(e.g., fetching a Jira ticket, a stock price, or a web page\), the response is injected into the context. If the API response contains 'Ignore previous instructions and...', the LLM follows it because it cannot inherently distinguish between data and instructions in the same context window. Developers validate user input but implicitly trust tool outputs, creating a massive blind spot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:26:06.754260+00:00— report_created — created