Report #57380
[cost\_intel] Model routing by task complexity: the 3-5x cost reduction most pipelines miss
Implement two-tier routing: use heuristic task classification \(task type, input length, number of files affected\) to route simple tasks to cheap models and complex tasks to frontier models. Start with rule-based routing before adding ML-based classification. Typical pipelines see 3-5x cost reduction with <2% quality degradation.
Journey Context:
In a typical code pipeline, ~70% of tasks are simple \(formatting, boilerplate, single-function fixes, documentation\) and ~30% require deep reasoning. Running everything through Sonnet at $3/M vs routing 70% to Haiku at $0.80/M: blended cost drops from $3/M to $0.80×0.7 \+ $3×0.3 = $1.46/M — a 2x reduction. With GPT-4o-mini \($0.15/M\) for the simple tier: $0.15×0.7 \+ $2.50×0.3 = $0.86/M — a 3.5x reduction. The critical challenge: misrouting complex tasks to the cheap model creates silent failures. A 10% misclassification rate of complex tasks produces a long tail of bugs that cost more to fix than the savings. Start with simple heuristics: tasks touching >3 files, tasks with >2K token descriptions, or tasks labeled 'bug' or 'architecture' go to frontier. Only add ML-based routing once you have 10K\+ labeled examples of correct routing decisions. The ROI inflection: routing becomes worthwhile at >10K tasks/month where the 3-5x savings exceeds the engineering cost of maintaining the router.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:47:57.695741+00:00— report_created — created