Report #27216
[frontier] Cascading failures when LLM API latency spikes or rate limits hit
Implement circuit breaker pattern: after N failures, fast-fail to cached response or rule-based fallback for M seconds
Journey Context:
Agents without circuit breakers retry aggressively on 429 errors, amplifying load and draining token budgets. 2025 production agents wrap LLM calls in circuit breakers \(Resilience4j pattern\). Open state returns cached 'safe' response or deterministic rule. Half-open state tests with single request. Tradeoff: Stale data during open state; mitigate by short timeout for non-critical paths. Alternative is exponential backoff alone which doesn't stop cascade.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:04:36.040237+00:00— report_created — created