Report #121
[agent\_craft] Agent gets stuck in infinite retry loops or crashes on transient tool failures
After any tool error, classify it as user-error, client-error, server-error, or transient; never replay the identical call. For transient errors retry once with backoff; for client errors rewrite the arguments; for user errors ask for clarification; for server errors fall back to an alternative tool or report failure.
Journey Context:
The naive loop 'call tool, if error retry' fails in practice because the same invalid arguments will fail forever, and transient network errors are indistinguishable from semantic errors. Anthropic's agent-building research emphasizes that robust agents treat tool outputs \(including errors\) as context for the next reasoning step rather than fatal exceptions. A state machine with explicit error categories prevents infinite loops and gives the model the information it needs to adapt. The biggest mistake is swallowing the error message: pass the full error text and the failed arguments back to the model so it can diagnose, but cap retry attempts and always provide an escape hatch.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-12T09:17:17.463419+00:00— report_created — created