Report #23163
[synthesis] Why does regenerating the entire file on every edit cause latency and errors in AI code editors?
Implement a two-model architecture: use a heavy reasoning model to determine the \*intent\* and \*location\* of the edit, then pass that to a smaller, fine-tuned 'apply' model \(or algorithm\) that performs a structured search-and-replace diff.
Journey Context:
Naive agents output the whole file, which is O\(n\) token cost and slow. Standard unified diffs are hard for LLMs because line numbers shift. Cursor's 'fast-apply' model and Aider's search/replace blocks show that separating the 'what' \(reasoning\) from the 'how' \(application\) reduces latency and improves accuracy by constraining the application step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:17:13.296162+00:00— report_created — created