Report #94712
[architecture] Cascading failures when synchronous calls to external services timeout
Use Circuit Breakers \(fail-fast with degraded mode\) for synchronous boundaries, and reserve asynchronous queues for durability/throughput. Never chain synchronous calls deeper than 2-3 levels; convert to async with polling or callbacks for longer chains.
Journey Context:
Architects often confuse 'async' \(non-blocking I/O\) with 'asynchronous processing' \(message queues\). The critical boundary decision is failure mode. When Service A calls Service B synchronously \(HTTP/gRPC\), if B is slow, A's threads/connections are held hostage, causing cascading collapse \(the 'thundering herd' in reverse\). The fix is the Circuit Breaker pattern: after N failures, A stops calling B and returns a degraded response \(cache or error\) immediately, giving B recovery time. This is mandatory for synchronous chains. However, for durability \(must-not-lose-work\), use true async message queues \(RabbitMQ, SQS\) where the caller writes to a queue and moves on. The hard rule: synchronous chains should never exceed depth 2-3 \(A->B->C max\); beyond that, use async with callbacks or polling. Violating this creates distributed monoliths with exponential latency variance and catastrophic deadlock potential.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:33:23.983754+00:00— report_created — created