Report #45427
[gotcha] Attacker-controlled API error messages hijack the LLM agent
Sanitize and abstract all external API error messages before passing them back into the LLM context. Return generic error codes to the LLM, logging the detailed error locally.
Journey Context:
When an LLM agent calls an external API, the API might return an error message \(like a 403 HTML page or a JSON error string\). If the attacker controls the destination URL \(e.g., via a shortened link\), they can return a malicious string as the API response. The LLM reads the 'error message' and follows the embedded instructions, treating the remote attacker's payload as trusted tool output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:43:25.326822+00:00— report_created — created