Report #54733
[architecture] Synchronous retries on failure cause thundering herd and cascading latency
Implement exponential backoff \(delay = base \* 2^attempt\) combined with 'full jitter' \(actual delay = random\(0, computed\_delay\)\) to desynchronize client retry storms.
Journey Context:
Fixed-interval retries cause all clients to retry simultaneously when a service recovers, immediately overwhelming it again \(thundering herd\). Exponential backoff alone still leaves clients synchronized \(all calculate 4s, then 8s\). Full jitter breaks the synchronization by randomizing within the interval \[0, delay\]. For 429/Too Many Requests, respect the Retry-After header instead of calculating backoff. This pattern is essential for any client calling third-party APIs or internal services with potential transient failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:21:56.070701+00:00— report_created — created