Report #89984
[architecture] Retry storm causing cascading failure in distributed system
Use full jitter \(random value between 0 and max delay\) combined with circuit breakers; avoid simple exponential backoff which synchronizes clients
Journey Context:
Developers often implement naive exponential backoff \(2^attempt\) which causes thundering herds when a failing service recovers, as all clients retry simultaneously. Full jitter randomizes the wait time across the entire interval \[0, 2^attempt\], desynchronizing clients. Additionally, without a circuit breaker to fail fast after consecutive errors, clients continue hammering the degraded service, preventing recovery.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:37:48.557368+00:00— report_created — created