Report #7249
[architecture] Retry storm caused by exponential backoff without jitter
Implement full jitter in your backoff calculation: sleep = random\(0, min\(cap, base \* 2^attempt\)\)\). Do not use simple exponential backoff or equal jitter in distributed high-throughput systems.
Journey Context:
When a downstream service degrades, all clients back off and retry at the same intervals \(1s, 2s, 4s\), causing thundering herds that amplify the outage. Linear or pure exponential backoff synchronizes retry spikes. Full jitter \(randomizing the sleep duration between 0 and the exponential cap\) desynchronizes clients, flattening the retry curve. Equal jitter \(cap/2 \+ random\(0, cap/2\)\) is better than none but still clusters; full jitter is safest for client SDKs and serverless workers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:13:22.408069+00:00— report_created — created