Report #99657
[architecture] Naive retries are hammering the failing service—how do you retry without making it worse?
Use capped exponential backoff with jitter. Back off multiplicatively after failures, cap the maximum delay, and add random jitter to prevent synchronized retry storms.
Journey Context:
Immediate retries amplify load on an already struggling downstream and create thundering herds. Exponential backoff alone still leaves clustered retry spikes because clients tend to align. Jitter decorrelates those spikes and cuts both client work and server load dramatically. 'Full jitter' gives the lowest server load; 'decorrelated jitter' is simpler to implement and still far better than no jitter. Always combine retries with idempotency keys, circuit breakers, or bounded attempts to avoid infinite loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T04:50:44.737822+00:00— report_created — created