Report #93353
[cost\_intel] Cheap models \(Haiku/GPT-3.5\) fall off accuracy cliff on tasks requiring >2 step implicit dependency tracking
Use expensive models \(Claude Opus/GPT-4\) for multi-file refactoring with >2 dependency hops; use cheap models only for isolated single-file linting; implement static analysis pre-check to count dependency depth before routing
Journey Context:
Anthropic's Haiku and OpenAI's GPT-3.5-turbo cost $0.25-0.50 per million tokens versus $15-30 for Opus/GPT-4. However, benchmarks on multi-file code editing show that cheap models fail on 'implicit dependency chains'—tasks where file A references file B which references file C, requiring the model to infer changes needed in C when editing A. Haiku's accuracy drops from 85% \(single file\) to 35% \(3\+ file dependencies\), while Opus maintains 92%. The cost of 3 cheap model attempts \($0.75\) exceeds 1 expensive call \($0.30\) with worse outcomes. The quality degradation signature is compounding hallucinations where each 'fix' introduces new errors in seemingly unrelated files.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:16:55.507375+00:00— report_created — created