Report #63773
[synthesis] AI agent code edits are too slow due to waiting for full LLM response generation
Decouple intent from execution: use a frontier model for planning and a smaller, fine-tuned model for immediate diff application. Stream and apply edits speculatively before full generation completes.
Journey Context:
Agents commonly use a single monolithic model call for code generation, causing high latency. Cursor's architecture reveals a bifurcated approach: a smart model determines WHAT to do, and a fast model executes the diff rapidly. This reduces perceived latency from seconds to milliseconds but introduces the risk of incorrect speculative applies, requiring robust rollback mechanisms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:31:46.885329+00:00— report_created — created