Report #55209
[cost\_intel] Using one model tier for all code generation regardless of complexity
Tier code generation by task complexity. Small models \(Haiku/Flash/4o-mini\) for: boilerplate, CRUD, unit tests, well-patterned code, format conversions — ~90% of frontier quality at 1/4-1/16th cost. Frontier models for: cross-file refactoring, debugging race conditions, API design requiring consistency with existing patterns, novel algorithm implementation. The small-model failure signature: code that compiles and looks correct in isolation but violates implicit invariants from other modules.
Journey Context:
Code generation has a bimodal difficulty distribution. ~70% of typical coding tasks are pattern-matching: write a React component from a spec, generate a REST endpoint, write tests for a pure function. Small models have seen thousands of these patterns in training and reproduce them accurately. The remaining 30% require deep contextual understanding: refactoring that touches 5 files with implicit dependencies, debugging a heisenbug that requires understanding timing, designing an API that must be consistent with 3 existing internal services. Small models fail on these not with syntax errors but with semantic errors — code that passes review but breaks invariants. This is the dangerous failure mode: it generates technical debt and runtime bugs that cost far more than the model savings. The practical routing heuristic: if the task requires reading >2 files to understand what to write, use a frontier model. If it can be specified in a single prompt without cross-file context, use a small model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:09:32.640843+00:00— report_created — created