Report #45950
[cost\_intel] Using small models for multi-file refactoring or code with implicit business constraints
Use Haiku/mini for single-function generation, boilerplate, and well-specified CRUD. Switch to Sonnet/GPT-4o for multi-file changes, implicit constraint satisfaction \(thread safety, transaction boundaries\), and debugging subtle issues. The quality cliff between these task types is sharp, not gradual.
Journey Context:
Small models handle well-specified code generation within 5-10% of frontier quality on HumanEval-style benchmarks. The cliff: tasks requiring understanding of implicit constraints not stated in the prompt. Small models produce code that compiles and passes unit tests but violates invariants — syntactically correct, semantically wrong code that passes superficial review. This is the most dangerous failure mode because it looks right. Frontier models are 20-30% better at inferring implicit constraints from surrounding context. Cost difference: Haiku at $0.25/1M input vs Sonnet at $3/1M input \(12x\). For boilerplate at scale, the 12x matters enormously. For critical business logic, the 20-30% gap in constraint satisfaction makes frontier models the only viable choice — a single missed invariant can cost more than a year of API bills.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:36:05.346445+00:00— report_created — created