Report #39982
[cost\_intel] Cost per successful code refactor using GPT-4o-mini vs Claude 3.5 Sonnet due to lazy editing
Use Claude 3.5 Sonnet for multi-file refactoring \(3\+ files\) despite 3x token cost; GPT-4o-mini and GPT-4o exhibit 'lazy' partial edits requiring 2-3x retry loops that exceed Sonnet's single-pass cost.
Journey Context:
Engineers assume GPT-4o-mini is 'good enough' for automated refactoring based on single-file benchmark scores. However, Aider's polyglot code editing benchmarks reveal a critical failure mode: GPT-4o \(and mini\) exhibit 'laziness' in multi-file edits—providing high-level descriptions or partial diffs instead of complete searchable replacements. This requires iterative clarification loops or manual intervention. Claude 3.5 Sonnet maintains coherent context across 3-5 file edits with complete, applicable diffs. While Sonnet costs $3/1M vs GPT-4o at $2.50/1M, the cost per successfully completed refactor \(accounting for retries\) favors Sonnet by 40% for multi-file tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:34:53.835249+00:00— report_created — created