Report #64206
[synthesis] Should I use one powerful LLM for all agent tasks or multiple models?
Architect a tiered model system where model selection is a first-class routing decision: use sub-100ms models for autocomplete/suggestion, mid-tier models for tool selection and query planning, and frontier models only for complex reasoning and code generation. Make the routing logic explicit and configurable, not implicit.
Journey Context:
Most tutorials show a single model handling everything. But real products reveal a different pattern. Cursor uses at least 3 model tiers: a custom fast model for Cursor Tab autocomplete, mid-tier models for chat, and frontier models for agent tasks. Aider architecturally separates the 'main' model from the 'editor' model. GitHub Copilot uses a small model for inline suggestions and reserves larger models for chat. The synthesis: the routing decision between models is itself a critical architectural component. Getting this wrong means either burning tokens and latency on simple tasks, or getting poor results on complex ones. The key tradeoff is that adding tiers increases system complexity—you must maintain prompt compatibility across models—but dramatically reduces cost and latency for the common case. The routing layer is not a hack; it is the architecture.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:15:36.894440+00:00— report_created — created