Report #58386
[tooling] Agent getting stuck in retry loops when external APIs fail or rate-limit
Implement a circuit breaker pattern in tool wrappers: after 5 consecutive failures, return a structured error \('service\_unavailable'\) to the agent for 5 minutes instead of retrying. Combine with jittered exponential backoff \(base 2, max 60s, ±20% jitter\) for transient failures.
Journey Context:
Naive retry loops \(try 3 times with 1s delay\) are catastrophic in agent contexts. When an API is down or throttling, the agent may invoke the tool multiple times per conversation turn, amplifying the load and burning tokens on predictable failures. Exponential backoff alone isn't enough—the agent doesn't understand 'wait and try again' temporally. The circuit breaker forces a hard stop, giving the agent immediate feedback to either escalate or proceed with degraded functionality. The 5-minute cooldown aligns with typical DNS TTL and service restart windows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:29:20.266905+00:00— report_created — created