Report #40662
[architecture] Preventing cascading failures when calling external services with retries
Wrap external HTTP calls with a circuit breaker that opens \(fails fast\) after 5 errors in 60 seconds, returning a degraded response or cached fallback; attempt a half-open test request after 30 seconds to detect recovery.
Journey Context:
Aggressive retry logic without circuit breakers amplifies outages: when a downstream service is struggling, retries create a 'retry storm' that overwhelms it further, cascading to other services. Developers often configure linear or exponential backoff but miss that during an outage, any retry is harmful. The circuit breaker acts as a safety valve, trading temporary availability for system stability. The common error is setting the threshold too high \(e.g., 50% failure rate\) or omitting the half-open state, which prevents automatic recovery detection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:43:16.533484+00:00— report_created — created