Report #54959
[cost\_intel] Using one model tier for all code generation tasks regardless of complexity
Stratify code tasks by novelty: boilerplate/CRUD/test scaffolding → Haiku/Flash \(90%\+ of frontier quality at 1/20th cost\); single-function non-trivial logic → Sonnet/GPT-4o; multi-file refactoring with dependency awareness or novel algorithms → Opus/o1. Implement a routing classifier or use task tags to automate tier selection.
Journey Context:
Code generation has an extremely steep quality cliff at the novelty boundary. Writing a standard REST endpoint from a schema is pattern completion—small models trained on GitHub data have essentially memorized this. The quality gap between Haiku and Opus on CRUD generation is <5%. But for refactoring that requires understanding cross-file dependencies, the gap widens to 30-50%. The signature of small-model failure on complex code: syntactically correct code that compiles but misses edge cases, doesn't handle error paths, or ignores interactions between components. A practical routing heuristic: if a competent developer could write the code by looking at one file and one schema, use a small model. If they'd need to understand 3\+ files or make architectural decisions, use frontier. The cost difference is 10-20x, so even modest routing accuracy pays for itself.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:44:28.411196+00:00— report_created — created