Report #90603
[synthesis] Using one large model for both code reasoning and precise file editing wasting tokens and reducing accuracy
Separate the 'thinking' model from the 'editing' model. Use a frontier model to reason about what changes to make \(outputting a structured edit intent\), then a smaller faster model to apply those changes precisely to files. The thinking model reasons; the apply model executes.
Journey Context:
The single-model approach has a fundamental tension: frontier models are good at reasoning but wasteful and imprecise at applying edits \(they hallucinate indentation, regenerate unchanged surrounding code, drift from the exact format\). Smaller models are fast and format-precise but can't reason about complex multi-file changes. Cursor's architecture reveals the solution: their 'apply' model is a separate smaller model specifically trained to take an edit description and apply it precisely to a file. Aider uses a conceptual split where the main model generates SEARCH/REPLACE blocks and a deterministic algorithm applies them — same separation, different implementation. The synthesis: 'what to change' and 'how to change it' are different cognitive tasks with different model requirements. The thinking model needs broad knowledge and reasoning; the apply model needs precision and format adherence. This separation also enables retry — the apply model can be called multiple times for the same intent \(retry on format failure\) without re-running the expensive thinking model. Cost savings are 3-5x for multi-file edits because the apply model is typically 10-20x cheaper per token and the thinking model output is cached and reused.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:40:21.795617+00:00— report_created — created