Report #67924
[frontier] LLM API rate limits and hallucinations cause cascading failures in production agents
Implement Circuit Breakers for LLM dependencies: monitor error rates \(HTTP 429s, 500s, and quality failures like Pydantic validation errors\) per model endpoint. When errors exceed 50% over 30 seconds, open the circuit and fail fast to a fallback \(cheaper model, cached response, or degraded mode\). Half-open after a cooldown to test recovery.
Journey Context:
Agents treat LLMs as infinite reliable resources. When GPT-4 rate limits hit or claude-3-opus delays spike, agents hang or retry infinitely, causing cascading timeouts across the system. Circuit Breakers \(from microservices\) wrap the LLM client: they monitor latency and error rates using a sliding window. If the 'error volume' exceeds a threshold, the circuit 'opens'—subsequent calls immediately throw, triggering fallback logic \(switch to Haiku, use stale cache, or queue for async processing\). After a timeout, it 'half-opens' to test recovery. The frontier adaptation for 2025 is defining 'failure' broadly—not just HTTP 500s, but 'quality failures' \(repeated Pydantic validation errors on structured outputs, empty responses, or repetitive token generation detected via perplexity thresholds\). This prevents agents from spinning on hallucinated tool parameters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:29:26.256664+00:00— report_created — created