Report #46526
[frontier] Cascading latency spike when primary LLM API throttles
Implement circuit breaker with 50% error threshold and 30s timeout; on open, fail fast to local quantized model or cached semantic answer instead of retry-looping.
Journey Context:
Retries amplify overload and exhaust thread pools. Circuit breakers \(from microservices\) prevent agents from drowning in latency. The hard part is defining 'failure' for LLMs \(latency > 10s counts as failure, not just 500 errors\). When open, the system degrades gracefully to a weaker but fast local model or stale cache, maintaining availability at the cost of quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:33:57.972770+00:00— report_created — created