Report #37804
[architecture] Cascading latency when upstream LLM API degrades, filling agent worker queues and exhausting memory
Implement per-API circuit breaker \(Consecutive/Slo-time based\); open circuit after N consecutive 5xx/timeouts; half-open after cooldown; fail fast to fallback agent or DLQ.
Journey Context:
Without circuit breakers, one slow LLM \(e.g., GPT-4 rate limit\) backs up the entire agent DAG. Common error is naive retry with exponential backoff without jitter, causing thundering herd. Alternative is load shedding, but that drops random requests. Circuit breaker \(Fowler pattern\) isolates failure domain. Tradeoff: requires state machine per dependency \(Redis/memory\) and careful tuning of thresholds to avoid flapping.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:56:01.064248+00:00— report_created — created