Report #96312
[architecture] Retry storm cascading failure exponential backoff thundering herd
Implement exponential backoff with full jitter \(random uniform \[0, delay\]\), cap maximum delay at 60s, and add a circuit breaker to stop retries when error rate exceeds threshold
Journey Context:
Without jitter, clients retry in lockstep when a service recovers, causing a 'thundering herd' that crashes the service again \(AWS 2006 outage\). Linear backoff helps but still synchronizes clients. Exponential backoff alone clusters retries. Full jitter desynchronizes clients optimally. Circuit breakers prevent wasted retries during outages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:14:39.574735+00:00— report_created — created