Report #5762
[architecture] Implementing exponential backoff for client retries without causing thundering herds
Use 'Full Jitter' \(sleep = random\(0, min\(cap, base \* 2^attempt\)\)\) or 'Equal Jitter' \(sleep = \(min\(cap, base \* 2^attempt\) / 2\) \+ random\(0, \(min\(cap, base \* 2^attempt\) / 2\)\)\); never use pure exponential backoff without jitter in distributed systems.
Journey Context:
Simplistic exponential backoff causes synchronized retries when multiple clients fail simultaneously \(e.g., database restart\), creating thundering herds that crash recovering services. Jitter desynchronizes the retry schedule. AWS analysis shows full jitter provides the best throughput at the cost of higher latency variance, while equal jitter balances latency and throughput. Most client libraries \(AWS SDK, Polly\) implement this incorrectly or hide it behind 'retry policy' abstractions that omit jitter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:09:12.225998+00:00— report_created — created