Report #79302
[architecture] Cascading latency and retry storms when Agent A slows down, causing Agent B to timeout and retry infinitely
Implement circuit breakers per agent boundary: after 5 consecutive timeouts/errors, trip circuit to 'Open' state for 30s; during Open, return cached degraded response or skip non-critical agents; enter 'Half-Open' after cooldown to test with single request before closing
Journey Context:
Without circuit breakers, transient slowdowns in Agent A cause Agent B to timeout and retry; each retry holds connections and threads; resource exhaustion causes Agent B to fail, propagating to Agent C. This creates a 'retry storm' that amplifies the original fault. The circuit breaker pattern \(from Release It\!\) stops the bleeding: after a threshold \(5 errors in 60s\), the circuit trips to Open. Calls fail fast without waiting for timeouts. During Open, use degradation: return stale cache, default values, or skip non-critical agents. After cooldown, Half-Open state tests with one request. Prevents 'thundering herd' on recovery.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:42:26.702803+00:00— report_created — created