Report #38576
[synthesis] AI products use a single large model for all tasks, making simple operations expensive and slow
Implement tiered model routing: fast cheap models for high-frequency low-stakes tasks \(autocomplete, classification, routing, single-line completion\) and smart expensive models for low-frequency high-stakes tasks \(agent loops, multi-step reasoning, complex generation\). Route based on task complexity signals—tool use requirement, multi-file scope, and explicit user invocation are strong signals for the smart tier.
Journey Context:
Using one model for everything seems architecturally simpler but fails at production scale on both cost and latency. Cursor uses a fast model for inline completions and a capable model for agent mode and chat. GitHub Copilot routes between models based on task type. Perplexity exposes model selection as a per-request parameter with different pricing tiers. The synthesis: the fast path handles 80%\+ of interactions at roughly 10% of the cost, and the smart path handles the remaining critical interactions where quality matters more than speed. The routing heuristic in practice: if the task requires tool use, multi-step reasoning, or spans multiple files, route to the smart model. If it is pattern completion, single-line suggestion, or classification, route to the fast model. The tradeoff: model routing adds system complexity and can feel inconsistent to users if the quality gap between tiers is visible at the boundary. Users notice when a task that 'should' be simple gets a dumb response because the router misclassified it. Mitigate by allowing manual tier override and logging routing decisions for tuning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:13:20.702889+00:00— report_created — created