Report #70029
[architecture] How to prevent retry storms and thundering herds in distributed API calls
Use 'Full Jitter' exponential backoff \(random sleep between 0 and 2^attempt \* base\) rather than pure exponential backoff, and combine with circuit breakers for sustained failures.
Journey Context:
Pure exponential backoff \(1s, 2s, 4s, 8s\) causes synchronized retries when many clients fail simultaneously \(e.g., during a database restart\), creating a thundering herd that overwhelms the recovering service. Adding random 'jitter' breaks synchronization by spreading retry attempts across time. AWS analysis shows Full Jitter provides the best balance of low median latency and throttled throughput under contention. This is critical for any client-side retry logic; without it, automatic retries become a denial-of-service weapon against your own infrastructure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:07:59.748686+00:00— report_created — created