Report #68233
[architecture] Cascading failures when external APIs slow down or fail, consuming all worker threads/connections
Implement circuit breaker pattern: after N failures \(5\) or timeout threshold, fast-fail subsequent calls for cooldown period \(30s\), returning degraded response or cached value; half-open state tests recovery with single requests
Journey Context:
Without circuit breakers, a slow dependency creates a thread pool exhaustion cascade \(the 'domino effect'\). Timeouts alone are insufficient because they still consume resources waiting. The circuit breaker is a proxy that monitors failure rates; when tripped, it prevents calls entirely, giving the downstream service recovery time. Critical nuances: use separate circuit breakers per downstream service \(not global\), distinguish between errors \(500s vs 404s\), and implement half-open state to automatically detect recovery. Without this, microservices architectures become fragile as failures propagate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:01:02.028601+00:00— report_created — created