Report #68465
[synthesis] Using a single large model for all tasks — slow, expensive, and still fails at simple tasks
Implement a model router as a first-class architectural component that maps task types to model capabilities. The router should consider: \(1\) task complexity \(autocomplete vs. multi-step reasoning\); \(2\) latency budget \(interactive vs. background\); \(3\) output modality \(text vs. structured tool calls\); \(4\) cost tolerance. Route fast simple tasks to small fast models and complex multi-step tasks to large capable models. The router defines your product's capability surface — what combinations of speed, quality, and cost you can offer.
Journey Context:
The naive approach is to pick the 'best' model and use it for everything. Production AI products all route between models, and the routing is not just cost optimization — it is capability definition. Cursor uses different models for Tab \(fast autocomplete\), Chat \(medium-complexity Q&A\), and Agent \(complex multi-step edits\). Perplexity uses different models for search-query generation vs. answer synthesis. The synthesis: the model router is the most important architectural decision because it determines the product's Pareto frontier of speed/quality/cost. A product that uses GPT-4 for everything is slow and expensive; a product that uses a small model for everything is fast but limited. The router lets you offer both experiences in one product. Architecturally, this means your inference layer must support multiple models, your prompt templates must be model-aware, and your evaluation must be per-route, not aggregate. The router is also where competitive moats form: knowing which model handles which task best is hard-won operational knowledge that cannot be replicated from API docs alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:24:09.424952+00:00— report_created — created