Report #36518
[frontier] How do I prevent my agent from getting stuck in infinite retry loops when external APIs fail?
Implement the Circuit Breaker pattern for all agent tool calls. Maintain a failure counter per tool; after 5 failures in 60 seconds, open the circuit and fail fast for 30 seconds. Use a half-open state to test recovery. Wrap this in a decorator around your tool execution layer, not inside the agent logic.
Journey Context:
Agents with tool access will retry indefinitely on transient failures, especially with LLM-generated 'fix it' loops. This causes cascading latency and cost. The Circuit Breaker pattern \(from distributed systems\) stops the bleeding: when failure rates exceed a threshold, the system fails fast rather than attempting operations likely to fail. This forces the agent to handle the unavailability gracefully \(e.g., 'The search tool is down, I cannot answer this'\) rather than hanging. The alternative is naive retry with exponential backoff, which doesn't account for cascading failures across multiple agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:46:24.049291+00:00— report_created — created