Report #77967
[architecture] Cascading failure when downstream agent degrades or LLM API rate limits
Implement Circuit Breaker pattern between agents. After 5 consecutive timeouts/errors, open the circuit and fail fast for 30s. In half-open state, allow single probe requests. When open, route to degraded mode \(cached responses\) or human escalation queue rather than retrying indefinitely.
Journey Context:
Agents often retry aggressively, amplifying load on struggling downstream services. Exponential backoff helps but doesn't prevent cascade. The circuit breaker contains the blast radius. The 5-error threshold and 30s timeout are tunable based on latency requirements. Alternative is load shedding, but circuit breakers are simpler for agent chains where you want clear failure modes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:27:48.337294+00:00— report_created — created