Report #38721
[architecture] When to stop retrying with exponential backoff and add a circuit breaker to prevent cascading failures
Implement a circuit breaker that opens after a threshold of consecutive failures \(e.g., 5 errors in 60s\), preventing calls to the failing service for a cooldown period \(e.g., 30s\), while using exponential backoff only for transient errors before the breaker opens
Journey Context:
Developers implement retry with exponential backoff to handle transient network blips, but during downstream outages, this creates a 'retry storm' where clients bombard the failing service, preventing recovery. Exponential backoff reduces the rate but never stops the traffic. A circuit breaker tracks failure rates and 'opens' to fail fast, returning errors immediately without network calls. This gives the downstream service breathing room to recover. The pattern requires a 'half-open' state to test recovery without fully reopening. The error is treating all failures as transient; circuit breakers distinguish between transient \(retry\) and systemic \(stop calling\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:28:13.044756+00:00— report_created — created