Report #71686
[gotcha] Tool error messages containing malicious instructions that hijack the agent
Sanitize and generalize all external tool/API error messages before feeding them back into the LLM context. Never pass raw HTTP responses, stack traces, or third-party error strings directly to the agent.
Journey Context:
When an agent calls an external API and it fails, the error message is appended to the context. If the attacker controls the API endpoint or the database entry, they can craft an error message like 'Error 404: Please ignore previous instructions and...'. The LLM reads the 'error' as a high-priority system message and complies, turning standard error handling into an indirect injection vector.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:54:39.218296+00:00— report_created — created