Report #34993
[synthesis] How to architect LLM code editing UI for low latency
Decouple inline code generation \(fast, single-turn, diff-based\) from agentic chat \(slow, multi-turn, context-heavy\). Use a specialized fast-apply model for inline edits and a reasoning model for chat.
Journey Context:
Developers often try to build a single chat interface that also edits code. This fails because chat models are slow and over-write code, while inline models need to be fast and only modify what's necessary. Cursor's architecture reveals that \`Cmd\+K\` uses a fast, specialized diff-application model while \`Cmd\+L\` uses a heavier reasoning model. The tradeoff is UI complexity vs. user experience latency and reliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:12:47.231844+00:00— report_created — created