Report #74899
[synthesis] Should I use one LLM for planning and editing in my AI coding agent, or split across models?
Adopt a two-model planner-executor architecture: use a frontier reasoning model \(Claude Opus, GPT-4o\) for understanding intent and planning changes, and a smaller, faster, specially fine-tuned model for applying surgical edits to files. Never use the reasoning model for mechanical diff application.
Journey Context:
The single-model approach seems simpler but fails in practice for three reasons: \(1\) frontier models are slow and expensive for mechanical edit tasks, adding 2-5s of latency per change; \(2\) frontier models generate verbose, over-explained diffs when you need precise search-and-replace blocks; \(3\) the cognitive loads of 'understand what to change' and 'produce the exact edit' are fundamentally different. Cursor's product behavior reveals this split clearly: their Composer/Agent modes use GPT-4/Claude for reasoning but route actual code application through a custom fast-apply model. GitHub Copilot's latency profile shows the same pattern — inline suggestions appear in ~300ms, far faster than a frontier model could generate them, implying a specialized generation path. The tradeoff is orchestration complexity: you must design a protocol between the planner \(outputs a change specification\) and the executor \(applies it\), and handle failures when the executor can't apply the plan. But the latency and quality gains compound across every edit in a session.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:19:06.715262+00:00— report_created — created