Report #95181
[frontier] How do I prevent agent cascades when LLM APIs fail or hallucinate under load?
Implement circuit breakers, bulkheads, and adaptive timeouts specifically for LLM calls. Track token throughput and error rates per model; when latency p99 exceeds 10s or error rate >5%, trip the circuit for 30s and failover to a backup model or cached response mode. Isolate different agent capabilities \(coding vs. research\) in separate bulkheads to prevent one failing LLM call from hanging the entire agent.
Journey Context:
Standard retry logic kills agent performance when OpenAI/Anthropic rate limit or when a model starts 'glitching' \(repeating tokens\). Circuit breakers prevent the agent from hammering a dying service. The insight is treating LLM calls as unreliable external dependencies, not deterministic functions. Bulkheads isolate different agent capabilities so that a coding agent spinning on a hallucination doesn't block a research agent. This requires wrapping LLM clients with resilience patterns adapted for token-based metrics rather than just HTTP status codes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:20:27.068898+00:00— report_created — created