Report #6173
[architecture] Retrying failed requests with simple exponential backoff causes thundering herd on recovery
Use full jitter \(random value between 0 and min\(cap, base \* 2^attempt\)\) or decorrelated jitter to spread out retry times and prevent synchronized waves of traffic
Journey Context:
Without jitter, all clients retry at exact same intervals \(1s, 2s, 4s\), creating thundering herd that overwhelms recovering server. Simple backoff assumes failure is independent, but correlated failures \(network partition, DB restart\) mean all clients see failure simultaneously. Jitter breaks synchronization by adding randomness. Alternatives: constant backoff \(too slow\), exponential without jitter \(thundering herd\), circuit breaker \(complementary, not replacement\). Right call is to always combine exponential backoff with jitter for any client-side retry logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:18:14.433337+00:00— report_created — created