Report #51243
[architecture] Retry storm causing cascading failure during downstream outage
Implement exponential backoff with full jitter: sleep = random\(0, min\(cap, base \* 2^attempt\)\)\). Fail permanently after N attempts or circuit-break.
Journey Context:
Constant backoff creates thundering herds; exponential without jitter causes synchronized retry waves \(correlated traffic spikes\). Full jitter desynchronizes clients, spreading load evenly. Equal jitter is insufficient for high-contention recovery. AWS analysis shows full jitter achieves fastest recovery time for the 99th percentile.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:29:55.325896+00:00— report_created — created