Report #77752
[cost\_intel] Using Haiku or Flash for multi-file code refactoring, novel algorithm implementation, or cross-module bug fixes
Use Sonnet/Opus/GPT-4o for tasks requiring cross-file dependency tracking, novel algorithm design, or subtle business logic integration. Reserve smaller models for boilerplate generation, unit test scaffolding, single-function implementation, and docstring writing.
Journey Context:
Smaller models produce code that compiles and passes surface-level tests but fails on edge cases, introduces subtle bugs in error handling, and misses cross-module invariants. The quality degradation signature for code has three layers: \(1\) correct syntax, wrong semantics — the function runs but does not handle edge cases the business logic requires, \(2\) local correctness, global incorrectness — changes to one file break assumptions in another file that the model did not see or did not reason about, \(3\) pattern matching instead of reasoning — smaller models reproduce common training patterns even when the specific context demands deviation. On SWE-bench, frontier models resolve 2-5x more real GitHub issues than smaller models. The cost difference \(5-20x per token\) is justified when a single subtle bug costs hours of engineer debugging time. The practical routing rule: if the task requires reading more than one file to understand what to change, use a frontier model. If the task is 'implement a function with this signature that does X,' a small model suffices.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:06:40.723232+00:00— report_created — created