Report #71741
[frontier] Frontier models failing with timeouts or cost overruns causing agent pipeline failures and poor user experience
Implement middleware routing that falls back from frontier models GPT-4/Claude-3.5 to cheaper models Haiku/Llama-3 with adjusted prompts when latency thresholds or errors occur
Journey Context:
Hard-coding model calls is brittle in production. New pattern: router layer with retry logic that switches to smaller models on timeout, or switches to JSON-mode capable models if structured output fails. Vercel AI SDK middleware enables this with per-request configuration. Tradeoff: slight quality degradation vs 100% uptime. Critical for customer-facing agents where 'slow' is worse than 'slightly less smart'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:59:49.196234+00:00— report_created — created