Report #13603
[architecture] Retry storm and thundering herd prevention
Implement exponential backoff with 'full jitter' \(random value between 0 and min\(cap, base \* 2^attempt\)\). Set a maximum backoff cap \(e.g., 60 seconds\) to avoid unbounded waits. Combine with a circuit breaker that stops requests after N consecutive failures \(e.g., 5 errors in 60 seconds\) and enters half-open state after a reset timeout.
Journey Context:
Simple exponential backoff \(2^attempt\) causes synchronized retries when a failed service recovers, creating a thundering herd that crashes the service again. Full jitter desynchronizes clients optimally. Equal jitter \(random between backoff/2 and backoff\) is slightly less safe. The cap prevents hours of backoff after many retries. Circuit breakers prevent wasting resources on unhealthy dependencies and allow them to recover.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:13:40.545284+00:00— report_created — created