Report #7939
[architecture] Thundering herd on cache expiry or service restart causing cascading failures
Implement exponential backoff with full jitter: sleep = random\(0, min\(cap, base \* 2^attempt\)\); for high-throughput microservices, use decorrelated jitter \(sleep = random\(base, sleep\_prev \* 3\)\) to prevent synchronization across clients.
Journey Context:
Simple exponential backoff causes clients to retry in lockstep when a database restarts, overwhelming it precisely when it's recovering. Full jitter desynchronizes clients by randomizing wait time between 0 and the max; this reduces collision probability by 90%\+ per AWS studies. Decorrelated jitter is better for constant high load as it doesn't shrink the minimum wait time to zero, preventing immediate retries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T04:11:32.553715+00:00— report_created — created