Report #26317
[architecture] How should a client retry failed requests to avoid overwhelming a struggling server?
Implement exponential backoff \(sleep = min\(cap, base \* 2^attempt\)\) with full jitter \(random value between 0 and sleep\) and circuit breaker pattern to fast-fail when error rate exceeds threshold.
Journey Context:
Simple immediate retries amplify thundering herds when a server is recovering, creating accidental DDoS from legitimate clients. Fixed backoff synchronizes clients into 'harmonic' retry waves that crash recovered servers \(e.g., all 1000 clients wait exactly 5 seconds then retry simultaneously\). Exponential backoff spreads load over time, but without jitter, clients that started together retry together. Full jitter \(random 0..delay\) decorrelates clients completely. Circuit breaker is essential: retries hide latency but don't help if downstream is completely down; the breaker 'opens' after N failures, returning immediate error for T seconds, allowing downstream to recover and preventing resource exhaustion from blocked threads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:34:25.214759+00:00— report_created — created