Report #92693
[architecture] How to implement retries without overwhelming failing services \(retry storms\)?
Use exponential backoff with full jitter: sleep = random\(0, min\(cap, base \* 2^attempt\)\). Use base=100ms, cap=60s. Add full jitter \(randomization between 0 and the calculated backoff\) to decorrelate synchronized retries from multiple clients.
Journey Context:
Simple exponential backoff causes thundering herds when multiple clients retry simultaneously \(synchronized retries\). Equal jitter \(sleep = cap/2 \+ random\(0, cap/2\)\) helps but full jitter \(random from 0 to cap\) provides better performance at high percentiles by spreading load more evenly, preventing correlated retry spikes from overwhelming the target.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:10:28.392889+00:00— report_created — created