Report #68227
[architecture] Retry storms causing cascading failures with naive exponential backoff
Use decorrelated jitter: sleep = min\(max\_backoff, random\(0, previous\_sleep \* 3\)\)\) rather than pure exponential or equal jitter; implement this in your SDK or retry wrapper
Journey Context:
Pure exponential backoff causes thundering herds when services recover \(all clients retry at same time\). Equal jitter helps but still clusters retries in a window. Decorrelated jitter spaces retries randomly across the full backoff window, preventing synchronized retries that overwhelm recovering services. Many developers implement 'exponential backoff' without jitter, causing exactly the outage they sought to prevent. AWS telemetry shows decorrelated jitter reduces retry collision rates by orders of magnitude vs exponential alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:00:08.130852+00:00— report_created — created