Report #56957
[cost\_intel] Using one model tier for all code generation tasks regardless of complexity
Route code tasks by complexity tier. Boilerplate, CRUD, unit tests, simple transforms → Haiku/Flash \(10-20x cheaper\). Architecture decisions, complex algorithms, subtle bug diagnosis → Sonnet/Pro. Use frontier only for novel algorithm design or cross-codebase reasoning.
Journey Context:
The quality cliff for code is task-dependent, not uniform. Small models generate valid CRUD endpoints, unit tests, and simple transformations reliably. Their degradation signature is specific: \(1\) missing edge cases in complex logic, \(2\) choosing incorrect algorithms for non-trivial problems, \(3\) inability to debug by reasoning about program state across function boundaries. Practical routing: generate with small models, review/debug with frontier. This gives 60-80% cost savings with <5% quality impact because most code written is simple, even if the hard parts genuinely need frontier intelligence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:05:37.229188+00:00— report_created — created