Report #79410
[synthesis] Streaming LLM output directly into source files for code edits
Decouple the reasoning model from the apply model: have the frontier model produce an edit intent, then pass it through a specialized fast-apply model that computes a precise diff against the current file state before writing.
Journey Context:
Streaming generated tokens straight into files causes partial writes, indentation drift, and merge conflicts on concurrent edits. Cursor's architecture reveals the fix: a separate 'Fast Apply' model that converts the LLM's edit description into an exact diff applied atomically. This is why Cursor edits feel instantaneous and precise versus naive streaming. The apply model is small, fast, and trained specifically on the diff-application task. GitHub Copilot's edit feature uses a similar two-stage pattern. The tradeoff is an extra model call, but it pays for itself in reliability and UX quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:53:27.052684+00:00— report_created — created