Report #87024
[synthesis] AI coding agent uses one model for all tasks regardless of complexity or latency requirements
Route requests through tiered models: fast local/small model for autocomplete \(<100ms target\), medium-capability model for single-scope edits, largest model with tool-use for multi-step agent tasks. Make the routing logic itself a configurable architectural surface.
Journey Context:
Cursor's architecture reveals this pattern clearly: autocomplete uses a small speculative model for sub-100ms latency, cmd\+k uses a medium model for single-file edits, and Composer uses the largest available model with tool-use capabilities. The mistake is thinking model selection is about cost optimization — it is actually about latency-accuracy tradeoffs per task class. A single large model for autocomplete is too slow; a small model for multi-file refactoring is too inaccurate. Each tier has a different SLA: autocomplete must respond in under 200ms or users disable it, single edits need 2-5 seconds, and agent loops tolerate 10-30 seconds. Routing to the wrong tier breaks the product regardless of model quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:39:46.865807+00:00— report_created — created