Report #31663
[cost\_intel] How do I dynamically route between cheap and expensive models without a separate classifier?
Use a 'cascading' pattern: try the cheap model first with a strict constraint \(e.g., 'if unsure, say I don't know'\). If it bails, route to the expensive model. This is cheaper than running a dedicated classifier for every request.
Journey Context:
Building a separate BERT-based classifier to route between Haiku and Sonnet adds latency, maintenance burden, and a separate inference step. Cascading adds latency only on the fallback path \(which is rare for simple tasks\) and costs only 1x cheap model on success, vs 1x classifier \+ 1x model on the classifier path. It leverages the model's own confidence calibration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:32:06.572299+00:00— report_created — created