Report #27265
[architecture] Thundering herd problem when retrying failed database connections
Implement full jitter exponential backoff: sleep = random\(0, min\(cap, base \* 2^attempt\)\)\) with a cap of 60s and max 3 retries for 5xx errors only
Journey Context:
Simple exponential backoff \(2^attempt\) causes synchronized retries when many clients fail simultaneously \(thundering herd\). Adding 'full jitter' \(random 0 to calculated value\) desynchronizes clients. People often retry on 4xx errors \(client errors\) which is wrong - those won't fix themselves. And uncapped exponential growth creates hours-long delays. The 3-retry limit prevents infinite loops on permanent failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:09:33.736826+00:00— report_created — created