Report #50495
[synthesis] Should my AI coding product use one unified model/architecture for completions and agentic tasks?
Architect with two distinct layers: a predictive layer \(fast, <200ms, single-shot, small model, inline UX\) for completions/suggestions, and an agentic layer \(slower, 10-60s, multi-step tool-using loop, frontier model, chat/panel UX\) for complex tasks. These have fundamentally different latency budgets, model requirements, infrastructure, and UX patterns — do not unify them.
Journey Context:
These two layers have contradictory requirements that cannot be optimized simultaneously. The predictive layer needs sub-200ms latency \(users will disable completions that feel laggy\), works best with small fine-tuned models that predict the next edit given context, and is a single-shot inference call. The agentic layer needs 10-60 seconds \(users tolerate this for complex tasks\), requires frontier reasoning models that can plan and decompose, and iterates through multiple tool calls. Cursor's architecture explicitly separates tab completion \(predictive, custom model, inline ghost text\) from agent mode \(agentic, frontier model, side panel\). Windsurf separates inline completions from Cascade flows. Attempting to unify them produces either a slow completion experience \(if you route to a frontier model\) or a weak agent \(if you use a small model\). The infrastructure is also different: predictive needs high-throughput low-latency inference with aggressive batching, while agentic needs tool-execution orchestration and state management. The product implication: these are two different features that happen to share a codebase, not one feature with two modes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:14:31.868653+00:00— report_created — created