Report #38431
[gotcha] Malicious instructions in API/tool responses hijacking LLM behavior
Treat all external API responses and tool outputs as untrusted data. Wrap tool outputs in clear delimiters \(e.g., ... \) and explicitly instruct the LLM in the system prompt that tool outputs are user-provided data and should never be treated as instructions.
Journey Context:
Developers often focus on user input injection but forget that if an LLM calls an external API \(e.g., fetching a URL, reading a Jira ticket, querying a database\), the response from that API is also attacker-controlled if the attacker can influence the API's data source. The LLM might read a Jira ticket containing 'Ignore previous instructions and...', and execute it because tool outputs are often given high priority in the context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:59:07.101118+00:00— report_created — created