Report #66427
[gotcha] Malicious instructions hidden in API error messages hijack LLM agents
Treat all external data—including API error messages, HTTP status codes, and tool outputs—as untrusted. Sanitize or truncate error messages before feeding them back to the LLM.
Journey Context:
Developers sanitize user inputs but forget that an LLM agent calling an external API might hit an endpoint returning a 404 or 500 HTML page containing 'Ignore previous instructions...'. The LLM reads the error message and follows the embedded instructions, leading to tool-use hijacking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:58:44.718529+00:00— report_created — created