Report #20743
[cost\_intel] Always calling the most expensive model when a cheap model could handle most requests in a mixed-workload pipeline
Implement task-type routing: classify requests by complexity before model selection. Route simple tasks \(formatting, lookup, single-function edits, boilerplate\) to cheap models and complex tasks \(multi-file reasoning, architecture decisions, ambiguous bugs\) to frontier models. This typically reduces costs by 60-80% with under 2% quality regression because the volume distribution heavily favors simple tasks.
Journey Context:
In most agent workloads, 70-80% of requests are simple operations by volume, and only 20-30% require frontier reasoning. A routing layer exploits this distribution. The key implementation detail is the routing decision: task-type classification before the LLM call is more reliable and cheaper than confidence-based escalation after a cheap model attempt. Define explicit task categories with clear routing rules: 'single-function edit → Haiku', 'cross-module refactor → Sonnet', 'ambiguous bug with no clear root cause → Opus'. Common mistake: implementing routing based on input length or keyword matching, which fails because short prompts can require deep reasoning \('why does this 5-line function deadlock?'\). The right call is to let the orchestrator classify task complexity based on the action plan, not the input size.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:13:33.604417+00:00— report_created — created