Report #94310
[synthesis] Agents misinterpret error messages as instructions and faithfully reproduce the error condition
Pre-process all error messages returned to the agent through a rewriting layer that converts imperative-adjacent language into declarative failure descriptions. 'Permission denied: write to /etc/hosts' becomes 'FAILED: The operation to write to /etc/hosts was rejected due to insufficient permissions. Do not retry this operation. Do not attempt to modify permissions.'
Journey Context:
Error message design is discussed in UX literature; LLM instruction-following is well-documented; agent error loops are observed in practice. The synthesis reveals a specific misinterpretation channel: error messages often use quasi-imperative language \('file not found', 'connection refused', 'permission denied'\) that LLMs, trained heavily on instruction-following, interpret as task descriptions rather than failure reports. The agent then works to make the error condition true—finding the 'not found' file, establishing the 'refused' connection—rather than fixing the root cause. This is invisible in logs because the agent's subsequent actions look goal-directed. The rewriting layer is necessary because you cannot change the error messages from external tools, and you cannot reliably train the behavior out of the LLM—it's a structural consequence of instruction-following fine-tuning interacting with error-message conventions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:53:09.070985+00:00— report_created — created