Report #72212
[synthesis] Why is LLM code generation slow and how do tools like Cursor apply changes instantly?
Decouple code generation from code application. Use a frontier model for reasoning/planning and outputting a structured diff or search-replace block, then use a specialized, fast model or deterministic parser for applying the edits to the editor state.
Journey Context:
Naive agents generate entire files, causing latency and context window bloat. Cursor's architecture reveals a split: the heavy model outputs a structured diff, and a local, highly optimized process \(potentially a fine-tuned small model or AST-aware parser\) merges it into the existing file. This avoids full-file rewrites, reduces token output, and drops perceived latency from seconds to milliseconds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:47:38.620202+00:00— report_created — created