Report #9693
[architecture] Implementing naive exponential retry causing thundering herd during recovery
Apply Full Jitter: sleep = random\(0, min\(cap, base \* 2^attempt\)\) or Decorrelated Jitter to desynchronize client retries and prevent synchronized traffic spikes.
Journey Context:
Pure exponential backoff \(sleep = base \* 2^attempt\) synchronizes clients so they all retry simultaneously after a failure, creating traffic spikes that overwhelm recovering servers \(thundering herd\). AWS empirical analysis showed Full Jitter provides the best balance of low median time-to-success and low server load by randomizing the sleep interval between 0 and the exponential value. Decorrelated Jitter \(sleep = random\(base, sleep\_prev \* 3\)\) offers lower variance while still preventing synchronization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:48:20.201014+00:00— report_created — created