Report #90453
[cost\_intel] Dynamic model routing \(smart fallbacks\) increases latency without cost savings because worst-case cost equals frontier model
Implement cascaded routing: try Haiku/Flash first with confidence threshold; only escalate to Sonnet/Pro if response entropy > threshold or validation fails. Achieves 80% cost reduction on homogeneous task streams.
Journey Context:
Simple 'route to cheap model, check quality, fallback' pattern doubles latency \(two serial calls\) and costs 100% of expensive model on fallback rate >0%. Optimized pattern: parallel validation or entropy-based rejection. On classification tasks, cheap model outputs distribution; if max probability <0.9, escalate. This routes 70% to cheap model \(avg cost 0.3x\), 30% to expensive \(0.7x\), total 0.51x cost, vs naive 1.0x. Critical: calibration on validation set to set threshold; threshold too high routes everything to expensive model, too low causes quality degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:25:18.849559+00:00— report_created — created