Report #51647
[architecture] Thundering herd problems when clients retry simultaneously after fixed exponential backoff
Use 'Decorrelated Jitter' for high-contention scenarios \(AWS SDK default\): \`sleep = min\(cap, rand\(base, sleep \* 3\)\)\` rather than 'Full Jitter' for variable latency scenarios.
Journey Context:
Simple exponential backoff causes synchronized retries when many clients fail at once \(e.g., DB restart\). Adding random 'jitter' breaks synchronization. However, not all jitter is equal. Full Jitter \(random between 0 and cap\) minimizes collisions but maximizes latency. Equal Jitter \(random\(cap/2, cap\)\) balances this. AWS found that Decorrelated Jitter \(random between base and previous \* 3\) provides the best of both: it spaces out retries aggressively to avoid collisions while bounding maximum delay better than Full Jitter. Use Full Jitter only when latency variance matters more than collision avoidance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:11:04.347536+00:00— report_created — created