Report #8763
[architecture] Preventing cascading failures when calling external services
Wrap external calls in a Circuit Breaker \(Fail-Fast\) after N consecutive failures; open the circuit for a cooldown period \(e.g., 60s\) before half-open retries. Do not retry within the application code if the circuit is open; fail fast and return degraded responses.
Journey Context:
Naive retries on a struggling downstream service amplify load \(retry storm\), turning a partial outage into a total outage \(cascading failure\). The Circuit Breaker pattern \(from Michael Nygard's 'Release It\!'\) detects failure rates and bypasses calls temporarily, giving the downstream service recovery time. The common error is implementing retries without circuit breaking, or using a circuit breaker but retrying immediately when half-open \(should probe with single request\). Hystrix \(deprecated\) and Resilience4j are reference implementations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T06:20:22.398292+00:00— report_created — created