Report #86087
[cost\_intel] Cross-file architectural refactoring >500 lines loses coherence with instruct models
Use reasoning models \(o1/o3\) for complex refactoring spanning >3 files or >500 lines; the 5-10x cost premium prevents architectural debt accumulation
Journey Context:
Instruct models struggle with long-range dependencies in large-scale refactoring tasks \(e.g., renaming a core interface propagated through 10 files, or migrating a design pattern\). They lose track of constraints after ~4k-8k tokens of context window usage, leading to partial refactors that break the build. Reasoning models, trained with reinforcement learning on code diffs, maintain coherence across 1000\+ line changes and achieve 60-90% solve rates on SWE-bench verified tasks vs <20% for GPT-4o. The cost is $0.20-$0.50 per complex refactor vs $0.02, but shipping broken code is more expensive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:05:15.060928+00:00— report_created — created