Report #77345
[architecture] Thundering herd problem when many clients retry failed requests simultaneously
Implement exponential backoff with full jitter: sleep = random\(0, min\(cap, base \* 2^attempt\)\); for client-side retries against shared services
Journey Context:
Naive immediate retries amplify load on struggling servers. Fixed backoff synchronizes clients \(if 1000 clients fail at once, they all retry after 5s, creating spikes\). Exponential backoff spaces out retries but can still synchronize \(all wait 4s, then 8s\). Adding jitter \(randomization\) desynchronizes the clients, smoothing load. Common mistake: using 'equal jitter' or 'decorrelated jitter' without understanding the tradeoff—full jitter provides the best spreading at the cost of potentially longer waits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:25:19.662812+00:00— report_created — created