Report #88528
[cost\_intel] Uniform model usage ignores 10x cost savings from query complexity routing
Implement a complexity classifier using Haiku/4o-mini to route queries: simple lookups to GPT-4o, complex multi-hop reasoning to o1; this achieves 90th percentile speed of cheap models with 95th percentile accuracy of expensive ones
Journey Context:
The naive pattern uses one model for all queries, either burning budget on simple questions or failing on hard ones. FrugalGPT and recent routing research show that a lightweight classifier \(Haiku or 4o-mini\) can predict query complexity with >90% accuracy based on token count, presence of 'why' or 'compare', and entity density. Route Tier 1 \(Factual lookup\) to 4o-mini; Tier 2 \(Synthesis\) to 4o; Tier 3 \(Planning/Proof\) to o1. This cuts average cost by 70% while maintaining high accuracy on the long tail. The signature that you need this is high variance in response quality within your workload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:10:38.110515+00:00— report_created — created