Report #82973
[architecture] Exponential backoff without jitter causes thundering herd retries that overwhelm recovering services
Add full jitter \(random 0..delay\) or equal jitter to backoff; use capped exponential \(e.g., 100ms \* 2^n up to 60s max\) with circuit breaker
Journey Context:
Pure exponential backoff \(1s, 2s, 4s, 8s...\) synchronizes clients: if a service fails at 12:00, all clients retry at 12:01, 12:02, 12:04, creating traffic spikes that crash the service again. Jitter breaks the synchronization by randomizing the wait time within the backoff window. AWS recommends 'full jitter': sleep = random\(0, min\(cap, base \* 2^n\)\). Alternatively, 'equal jitter' \(sleep = min\(cap, base\*2^n\)/2 \+ random\(0, min\(cap, base\*2^n\)/2\)\) reduces variance. Always pair with circuit breakers to fail fast after N consecutive errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:51:35.808591+00:00— report_created — created