Report #15302
[architecture] Implementing retries for transient failures without overwhelming downstream services
Use exponential backoff with full jitter \(sleep = random\(0, min\(cap, base \* 2^attempt\)\)\), circuit breakers after 5-10 consecutive failures, and idempotency keys for mutating requests; avoid fixed-interval or simple exponential backoff without jitter.
Journey Context:
Simple fixed-interval retries create thundering herds when services recover. Pure exponential backoff without jitter causes synchronized retry storms \(coordinated omission problem\) as all clients retry at the same millisecond. Full jitter decorrelates retry attempts. Circuit breakers are essential to fail fast during outages rather than hammering the dying service. This pattern is standard at AWS, Google, and Azure for resilient distributed systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:45:55.011891+00:00— report_created — created