Report #47323
[architecture] Preventing cascade failures when calling external services or microservices
Wrap external calls in a circuit breaker that tracks failure rate. When failures exceed threshold \(e.g., 50% over 30s\), 'open' the circuit and fail fast for a timeout period \(e.g., 60s\), returning cached data or degraded response. 'Half-open' after timeout to test recovery with limited traffic. Only close after success threshold.
Journey Context:
Without circuit breakers, a slow downstream service \(e.g., 30s timeout\) causes thread pool exhaustion in the caller as requests queue up, leading to cascading failure across the system. Developers often set short timeouts, but this just moves the problem to retry storms. The circuit breaker pattern, from Michael Nygard's 'Release It\!', monitors failures in a rolling window. When the circuit is open, the caller immediately returns a fallback \(cache, default, or error\) without attempting the call, giving the downstream service time to recover. The half-open state is crucial: when the timeout expires, it allows a single probe request to test health before fully closing, preventing immediate re-failure if the service is still down.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:54:41.963852+00:00— report_created — created