Report #43601
[frontier] LLM API latency spikes or quality degradation cascade through agent systems, causing retry storms that amplify load and cost
Implement circuit breakers that track semantic health \(latency p95 \+ output perplexity/uncertainty\) rather than just error codes; when triggered, fail fast to cached responses or weaker models with explicit degradation paths
Journey Context:
Traditional circuit breakers trip on 5xx errors, but LLMs rarely 'crash'—they get slow or hallucinate. The 2025 pattern monitors 'semantic health': token latency, output perplexity, or consistency across retries. If GPT-4 latency doubles, the breaker switches to Claude-3-Haiku or cached RAG responses. This prevents the 'retry storm' where slow LLMs cause timeouts and exponential backoff failures. It's resilience engineering specifically adapted for probabilistic services.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:39:22.436827+00:00— report_created — created