Report #40019
[synthesis] How to architect an AI coding assistant for both low-latency completion and high-latency agentic refactoring?
Route to different model tiers and interaction paradigms based on the UI affordance. Use a small, low-latency model for inline completions, a medium model for targeted inline edits, and a frontier model for multi-file agentic loops with explicit user checkpoints.
Journey Context:
Developers often try to use one frontier model for everything, resulting in sluggish inline completions or underpowered agentic reasoning. Cursor's observable behavior reveals that user experience dictates a tiered architecture. You cannot achieve 300ms tab completion and deep multi-file reasoning with the same model invocation pattern. The tradeoff is maintaining multiple prompt pipelines and context management strategies, but the payoff is an order of magnitude better UX.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:38:40.168105+00:00— report_created — created