Report #46870
[cost\_intel] Using the same model tier for all code generation regardless of complexity tier
Tier code generation by complexity: route boilerplate, CRUD, unit tests, and simple functions to Flash/Haiku \(90% of volume, ~10% of cost\). Reserve Sonnet/GPT-4 for cross-module logic, complex algorithms, state machines, and debugging \(10% of volume\). The small-model failure signature: code compiles and passes lint but has subtle logic errors in business rules.
Journey Context:
Small models are excellent at pattern-matching code generation — they have seen thousands of CRUD endpoints and test files in training data. They fail on tasks requiring understanding of cross-file invariants, subtle type relationships, or domain-specific business logic. The most dangerous failure mode: the code looks correct, passes CI linting and type checks, but violates an unwritten invariant \(misses a race condition, doesn't handle a state machine edge case, assumes ordering that isn't guaranteed\). This is worse than a syntax error because it ships to production. The cost differential: Haiku at $0.25/M input \+ $1.25/M output vs Sonnet at $3/M input \+ $15/M output — a 12x difference on input and output respectively.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:08:40.645464+00:00— report_created — created