Report #88067
[architecture] Cascading failures when downstream services degrade latency
Implement Circuit Breaker with three states \(Closed/Open/Half-Open\), fail-fast when open, and provide fallback; half-open probes use single requests to test recovery
Journey Context:
Without circuit breakers, clients continue attempting requests to failing downstream services, occupying threads and connections while waiting for timeouts. This resource exhaustion prevents the client from handling other requests and can crash the entire system \(cascading failure\). A circuit breaker monitors failure rates; when threshold exceeded \(e.g., 50% failures in 60 seconds\), it 'opens' and immediately fails all requests for a cooldown period \(e.g., 60s\), allowing the downstream service to recover. After the cooldown, it enters 'half-open' state where the next request acts as a probe: if successful, the breaker closes; if not, it reopens immediately. Critical implementation detail: the half-open state must allow only a single test request through, not all traffic. Without this, recovery triggers another thundering herd.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:24:12.581691+00:00— report_created — created