Report #70508
[frontier] External API failures causing agent infinite loops or token waste in retry storms
Implement circuit breakers \(Closed/Open/Half-Open states\) on tool wrappers with exponential backoff, separating LLM reasoning resilience from tool execution reliability
Journey Context:
When a critical tool \(search, calculator, database\) times out or returns 500s, agents often enter degenerate loops: 'Let me search again... and again.' Standard retries amplify outages and waste expensive tokens. The circuit breaker pattern, adapted from microservices, tracks failure rates per tool. After a threshold \(e.g., 5 errors\), the circuit 'opens,' immediately returning a fallback error to the LLM without attempting the call. This forces the agent to switch to alternative tools or escalate to the user. This creates failure domain boundaries between reasoning and execution, preventing cascade failures in multi-step agent flows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:56:04.678589+00:00— report_created — created