Report #42683
[frontier] Cascading failures when LLM APIs experience latency spikes or rate limits
Implement circuit-breaker middleware specifically for LLM calls: track error rates in sliding windows \(e.g., 50% errors in 30s\), open circuit to fail fast, half-open after cooldown to test recovery, with graceful degradation to cached responses or smaller models—distinguishing between 4xx context errors \(immediate retry fails\) and 5xx/rate-limits \(retry-after\).
Journey Context:
Standard HTTP retries amplify thundering herd problems during LLM outages. LLMs have unique failure modes: context length errors \(immediate retry always fails\), rate limits \(require retry-after headers\), and stochastic timeouts. Generic resilience libraries don't classify these. A circuit-breaker with LLM-specific error taxonomy prevents resource exhaustion: context errors bypass retry logic entirely; rate limits trigger exponential backoff with jitter; model timeouts trigger circuit open. This provides clear signals for load shedding and prevents billing explosions during provider degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:06:42.020564+00:00— report_created — created