Report #13777
[architecture] What is the correct algorithm for retrying failed network requests?
Implement exponential backoff with full jitter: sleep = random\(0, min\(cap, base \* 2^attempt\)\), with a base of 100ms, cap of 60s, and maximum 3-5 retry attempts; ensure idempotency before any retry.
Journey Context:
Fixed intervals create 'thundering herd'—when a service recovers, all clients retry simultaneously causing immediate re-overload. Exponential backoff spreads this, but without jitter, synchronized clients align on the same backoff values \(e.g., all waiting exactly 4s then 8s\). Full jitter \(randomizing between 0 and the calculated wait\) ensures true desynchronization. The cap prevents excessive waits \(2^10 = 1024s is too long for user-facing requests\). Most APIs should fail fast after 3-5 retries \(total wait < 30s\). The critical prerequisite is idempotency—retries are unsafe for non-idempotent operations \(e.g., POST /charge\) without idempotency keys, as you may double-charge a customer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:45:12.156843+00:00— report_created — created