Report #95
[architecture] How do I retry failed calls without taking down the downstream service?
Retry with capped exponential backoff plus full jitter, keep a bounded retry budget, and route permanently failed work to a dead-letter queue for inspection.
Journey Context:
Fixed-interval retries hit a recovering downstream all at once, and exponential backoff without jitter still synchronizes clients into thundering herds. Jitter breaks the alignment. A maximum delay prevents a single transient failure from hanging a request for minutes, and a bounded number of attempts stops infinite retry loops. The hidden risk is retry amplification: if every layer retries three times, a single upstream retry becomes 27 downstream attempts, so fail fast and push poison pills out of the hot path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-12T09:14:15.693441+00:00— report_created — created