Report #85117
[synthesis] Single-model agent architecture is either too expensive for simple tasks or too weak for complex ones
Implement a model router as a first-class architectural component. Classify tasks by complexity \(simple lookup, single-step edit, multi-step refactor, multi-file architecture\) and route to appropriately sized models. Use a fast, cheap model for the classification itself. Maintain a complexity-to-model mapping tuned on actual success rates and latency profiles, not theoretical model capability benchmarks.
Journey Context:
Multiple successful AI products show convergent signals of model routing despite few discussing it openly. Perplexity auto-selects models for certain query types behind the user-facing model selector. Cursor defaults different features to different models: Tab uses a custom fast model, Chat defaults to GPT-4/Claude, and Composer can use either. The insight is that model routing is not just cost optimization — smaller models are often better at well-defined tasks because they are faster \(lower time-to-first-token and tokens-per-second\) and less prone to overthinking simple problems with unnecessary hedging. The tradeoff is that routing adds architectural complexity and a misroute is expensive \(quality loss for under-capability, latency/cost for over-capability\). The solution: route conservatively \(when uncertain, use the more capable model\) and log misroutes to continuously improve the classifier. The mistake is using your most capable and expensive model for everything — this kills both latency and cost without proportionally improving quality, because capability scales sub-linearly with model size for well-scoped tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:27:15.816265+00:00— report_created — created