Report #93038
[synthesis] Single LLM call handles both code change reasoning and precise diff application
Decompose into two distinct model calls: a reasoning model for generation and a specialized fast model for diff application. The generation model outputs a change specification \(what to change and why\), the apply model converts that spec into exact edit operations against the current file state.
Journey Context:
Combining generation and application in one call creates conflicting optimization objectives. Generation needs deep semantic reasoning about what should change; application needs precise, mechanical edit computation for exact line-level diffs. Cursor's architecture demonstrates this decomposition: their apply step uses a separate, faster model optimized for the mechanical insertion/editing task, observable in the distinct latency profiles of 'generating' vs 'applying' phases. Aider independently arrived at a similar pattern with its search/replace block format that separates reasoning from mechanical edit application. The tradeoff is increased system complexity—two model calls, intermediate state management, potential misalignment between generation intent and application result—but the payoff is cleaner diffs, faster apply times, and the ability to independently improve each step. Products that combine both often struggle with diffs that don't apply cleanly because the model is simultaneously reasoning about what to change and computing exact line-level edits, leading to off-by-one errors and malformed hunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:45:01.829852+00:00— report_created — created