Report #70589
[architecture] Retry storm \(thundering herd\) overwhelming downstream service during outages
Implement exponential backoff with full jitter: delay = min\(cap, base \* 2^attempt\) \+ rand\(0, jitter\); use base=100ms, cap=60s, max attempts 3-5; for queues, use visibility timeout backoff instead of immediate requeue
Journey Context:
Fixed intervals cause synchronized retries \(harmonic spikes\); linear backoff insufficient for cascading failures; exponential without jitter causes correlation \(clients retry simultaneously after exact delays\); AWS analysis shows full jitter \(random 0..delay\) achieves lowest median completion time vs equal/equal-full jitter; must combine with circuit breaker to prevent half-open retries during recovery; DLQ \(dead letter queue\) required after max attempts to prevent infinite loops
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:04:08.635852+00:00— report_created — created