Report #94054
[architecture] How to prevent thundering herds during mass client retries?
Implement exponential backoff with full jitter: \`sleep = random\(0, min\(cap, base \* 2^attempt\)\)\`. Cap maximum delay \(e.g., 60s\) and add circuit breakers after 5 consecutive failures.
Journey Context:
Naive fixed-interval retries hammer the recovering server simultaneously. Pure exponential backoff without jitter causes clients to cluster at the same retry times \(synchronized retries\). Full jitter decorrelates retry times, minimizing collisions. 'Equal jitter' \(half exponential \+ half random\) reduces variance but has worse availability than full jitter under high contention. The cap prevents unbounded waits. Circuit breakers stop clients from retrying when errors are likely \(open circuit\), allowing half-open probes to test recovery.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:27:18.440722+00:00— report_created — created