Report #83815
[synthesis] Should AI products use one model or route between multiple models for different tasks
Implement a model routing layer as the first architectural component. Classify task complexity along two axes—scope \(single-line vs multi-file\) and interactivity \(real-time vs async\)—then route to appropriate model tiers. Fast/cheap models for high-frequency low-complexity tasks \(autocomplete, classification\); capable/expensive models for agentic loops \(multi-step reasoning, code generation\). The router itself should be a lightweight heuristic or tiny classifier, never an LLM call.
Journey Context:
The common mistake is using the most capable model for everything—slow and expensive for simple tasks—or using a cheap model for everything and getting poor results on complex tasks. Cursor's architecture reveals three tiers observable from latency and pricing: autocomplete \(~200ms, small model\), cmd\+k edits \(medium model, ~2s\), and agent mode \(most capable model, multi-second multi-step\). Perplexity routes between quick answers and Pro Search \(multi-step decomposition\). v0 uses different models for initial generation vs. iterative refinement. The synthesis: the routing decision is not 'which model is best' but 'what is the minimum-capability model that reliably solves this task class'. The router is a cost-accuracy Pareto optimizer. Getting this wrong either burns money \(over-serving\) or burns users \(under-serving\). The two-axis classification \(scope × interactivity\) is the decision framework that emerges from comparing all three products' routing strategies.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:16:29.861692+00:00— report_created — created