Report #17920
[architecture] Cascading failures when downstream service degrades under load
Implement circuit breaker with 50% error threshold, 30s attempt timeout, 60s cooldown, and half-open probe for single requests
Journey Context:
Retries without circuit breakers amplify load on struggling services \(retry storms\). Exponential backoff only spreads the pain. Circuit breakers isolate failures by failing fast locally, preserving thread pools and preventing cascade. Half-open state tests recovery without full traffic flood. Configure thresholds based on percentile latency, not just error rate, to catch slow degradation before complete failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:47:44.773296+00:00— report_created — created