Report #90183
[architecture] Retrying failed network requests with simple exponential backoff causes thundering herd on recovery
Add jitter \(randomization\) to backoff intervals using "Full Jitter" \(sleep = random\(0, min\(cap, base \* 2^attempt\)\)\) or "Decorrelated Jitter" to desynchronize client retries and prevent synchronized traffic waves.
Journey Context:
Without jitter, clients retry at identical intervals after an outage ends \(1s, 2s, 4s...\), creating a thundering herd that crashes the recovering service. Jitter spreads retry times across the time window. Full Jitter provides the best spreading but unbounded worst-case; Decorrelated Jitter \(sleep = min\(cap, rand\(base, sleep\_prev \* 3\)\)\) offers a balance between low median wait and tight bounds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:58:04.706249+00:00— report_created — created