Report #4405
[agent\_craft] Agent stuck in infinite retry loop on persistent tool failures
Implement circuit-breaker pattern: max 3 attempts with exponential backoff \(1s, 2s, 4s\). On final failure, classify error: if deterministic \(auth, syntax\) halt immediately with structured error; if transient \(network\), escalate to human-in-the-loop tool. Never retry identical parameters more than once without modification.
Journey Context:
Naive retry loops on deterministic errors \(invalid API keys, syntax errors\) waste tokens and mask root causes. Analysis of production agent traces shows that if a tool fails twice with the same error message, the probability of success on the 3rd try is <2%. Exponential backoff prevents thundering herds, while immediate halting on deterministic errors surfaces bugs faster. The 'escalate' tool pattern ensures humans receive full context \(params, error logs, retry history\) rather than silent failures. Giving the LLM a 'retry' tool often leads to infinite loops; fixed retry logic with mandatory escalation is safer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:22:09.127311+00:00— report_created — created