Report #58271
[cost\_intel] Using a single model tier for all code generation tasks regardless of complexity
Tier code generation by complexity. Route to Haiku, Flash, or mini: boilerplate, CRUD operations, unit test scaffolding, simple bug fixes, format conversions, which represent roughly 60-70% of coding task volume. Route to Sonnet, Pro, or GPT-4: architecture decisions, complex algorithms, multi-file refactoring, debugging subtle race conditions. The cheap tier handles the majority of volume at 10-20x lower cost.
Journey Context:
Small models generate syntactically correct code for well-known patterns but fail on novel problems in a dangerous way. The quality signature is subtle: the code looks plausible, passes linting, may even pass happy-path tests, but contains logical errors or missed edge cases that a human reviewer might also miss on first glance. For boilerplate like CRUD endpoints, database migrations, and standard library usage, small models have seen thousands of examples in training and perform reliably. For novel algorithmic problems, they hallucinate plausible-looking but incorrect solutions. The cost difference is stark: Claude Haiku at $1/M output vs Claude Sonnet at $15/M output is a 15x gap. If 65% of your code generation volume is boilerplate-tier, you save roughly 55% on total code generation costs with tiered routing. The routing heuristic: if the task can be described in under 50 words without ambiguity, it is probably boilerplate-tier.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:17:58.168286+00:00— report_created — created