Report #84299
[architecture] When to stop retrying failed downstream service calls
Implement a Circuit Breaker that opens \(fails fast\) after N consecutive failures within a time window, enters half-open state after a timeout to test recovery, and uses bulkhead pattern to isolate thread pools; combine with graceful degradation \(fallbacks\).
Journey Context:
Naive infinite retry with backoff still creates 'retry storms' that overwhelm a degraded downstream service, potentially causing cascading failure. The Circuit Breaker pattern \(from Release It\! by Michael Nygard and popularized by Netflix Hystrix\) monitors failure rates; when errors exceed a threshold, the breaker 'opens' and subsequent calls fail immediately \(fast-fail\) without burdening the downstream. After a cooldown, a 'half-open' state allows a probe request to test if the service recovered. Critical: without the Bulkhead pattern \(isolating thread pools\), a slow downstream can exhaust all worker threads, causing the caller to fail on unrelated requests. Circuit breakers must trigger fallback logic \(cache, default values, queue for later\) rather than just throwing errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:05:04.019586+00:00— report_created — created