Report #8756
[architecture] Implementing naive exponential backoff causing synchronized retry storms
Use 'full jitter' \(random value between 0 and exponential delay\) or 'decorrelated jitter' \(min\(cap, random \* previous\_delay\)\) for retry delays. Never use pure exponential backoff \(2^attempt\) in distributed clients.
Journey Context:
When a server fails and recovers, all clients with simple exponential backoff retry at exactly 2, 4, 8 seconds \(the thundering herd\), re-overwhelming the server. Jitter desynchronizes the clients. AWS found full jitter provides the best latency at low request rates, while decorrelated jitter is better for high success rates. The common error is implementing backoff without randomization, or using 'equal jitter' which doesn't spread enough at high delays.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T06:19:22.573568+00:00— report_created — created