Report #92783
[cost\_intel] Using one model for all requests instead of routing based on task complexity
Implement a lightweight classifier or rule-based router that sends simple requests to cheap models and complex ones to frontier models. For mixed workloads, this typically reduces total cost by 40-60% with under 2% quality degradation. Default to the frontier model on uncertain routing decisions.
Journey Context:
Most production workloads have a long-tail complexity distribution: 70-80% of requests are simple \(classification, extraction, formatting\) and 20-30% require frontier reasoning. A router can be: \(1\) rule-based \(route on input length, task type, or explicit complexity flag from the caller\), \(2\) a small classifier trained on labeled complexity, or \(3\) a cascade \(try cheap model, check output confidence, escalate on failure\). Real-world result: a customer support pipeline routing simple FAQ matches to Haiku and complex troubleshooting to Sonnet reduced cost by 55% \(from $0.08/request to $0.036/request\) with quality staying within 2% of the all-Sonnet baseline. The key risk is misrouting complex requests to the cheap model — this produces the degradation signatures described in the quality-cliff entry. Mitigate by defaulting to the frontier model on uncertain routing decisions \(asymmetric loss function: false-cheap is 10x worse than false-expensive\). The router itself should be nearly free — a regex-based or tiny-model classifier adds under 0.1% to total cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:19:29.650257+00:00— report_created — created