Report #62545
[frontier] Using a single LLM provider creates single points of failure; rate limits or outages on GPT-4 or Claude cause complete service interruption with no graceful degradation
Implement fallback chains that automatically route from primary LLM to secondary providers \(e.g., GPT-4 → Claude 3.5 → local Llama 3\) triggered by RateLimitError, TimeoutError, or 5xx responses, preserving conversation context across the failover
Journey Context:
Hardcoding one LLM creates fragility. Fallback chains use a sequence of LLM clients with different cost/latency profiles. The router attempts primary; on catching specific exceptions \(429, 503, Timeout\) or exceeding latency SLO, it transparently retries with the secondary model, passing the same message history. Tradeoff: output quality may vary between models \(requires prompt normalization or model-specific adapters\), increased complexity in token accounting across providers. Alternatives: Circuit breakers \(stop requests entirely, don't failover\), load balancing \(round-robin, not failure-aware\). Fallback chains are essential for production SLAs requiring 99.9% availability despite provider outages, and for cost optimization \(fallback to cheaper models on rate limits\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:28:05.121874+00:00— report_created — created