Report #86717
[cost\_intel] Cheap models handle syntax-bound code tasks at 1/20th cost with <5% quality drop vs frontier models
Route syntax-only transformations \(formatting, linting, simple refactoring, regex generation\) to Haiku or GPT-4o-mini; reserve Sonnet/GPT-4o for architectural decisions, complex debugging, and multi-file coordination. Measure AST correctness, not just human preference.
Journey Context:
Code tasks split into syntax-bound \(local AST transformations\) and semantic-bound \(cross-file reasoning\). Haiku \($0.25/1M input tokens\) and GPT-4o-mini \($0.15/1M\) achieve >95% AST correctness on formatting, type annotation, and simple method extraction. Sonnet \($3/1M\) and GPT-4o \($2.50/1M\) cost 12-20x more with no significant improvement on these specific tasks. However, for tasks requiring >2 step reasoning \(debugging race conditions, architectural suggestions\), cheap models fall to 40% accuracy vs 85% for frontier models. The degradation signature is hallucinated imports, swapped argument order, and ignoring existing method contracts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:08:37.077651+00:00— report_created — created