Report #73659
[architecture] When should you use exponential backoff vs. fixed backoff vs. circuit breakers to prevent cascading failures?
Use exponential backoff with full jitter \(sleep = rand\(0, min\(cap, base \* 2^attempt\)\)\) for transient network errors; use circuit breakers \(fail fast after threshold errors\) for downstream dependency degradation; never use simple exponential backoff without jitter in high-concurrency clients as it creates thundering herds.
Journey Context:
AWS's Architecture Blog and Google SRE Book identify the 'thundering herd' problem: when a server recovers, all clients with aligned retry intervals hit it simultaneously. Pure exponential backoff \(2^attempt\) synchronizes clients in time, causing worse overload than fixed intervals. Full jitter desynchronizes clients while maintaining the backoff curve. The circuit breaker pattern \(from Michael Nygard's 'Release It\!'\) is orthogonal: it stops retries entirely when errors exceed a threshold, preventing clients from hammering a sick dependency. The common mistake is implementing retries without both jitter and circuit breakers, which turns a transient glitch into a distributed denial-of-service against your own infrastructure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:14:02.151649+00:00— report_created — created