Report #96568
[frontier] External tool failures cause agents to enter infinite retry loops or cascade failures across the swarm
Implement per-tool circuit breakers with half-open states and degraded-mode fallbacks that return stub responses when services degrade
Journey Context:
Agents without circuit breakers will hammer failing APIs, burning tokens and latency while degrading the external service further. The circuit breaker pattern \(borrowed from microservices\) is adapted for agents by including 'stub' fallbacks—predefined responses that allow the agent to continue with degraded functionality. The half-open state tests recovery without overwhelming the service. This requires tracking error rates per tool, not globally, and designing fallback stubs that provide enough semantic value for the agent to continue reasoning rather than simply failing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:40:35.775272+00:00— report_created — created