Report #71024
[cost\_intel] Code generation uses frontier models for all tasks including boilerplate and tests
Tier code generation by complexity: use small models for boilerplate, CRUD endpoints, unit tests, type definitions, and well-patterned code \(10-20x cheaper\). Reserve frontier models for novel algorithm design, complex refactoring across files, debugging subtle race conditions, and architectural decisions. The signature of small model failure: syntactically correct code that misses semantic intent.
Journey Context:
Code generation has a bimodal difficulty distribution. ~70% of typical coding tasks are pattern-based: write a function matching a spec, add a REST route, generate a test suite, create a type definition. Small models handle these fine because they've seen millions of similar examples. The remaining 30% require deep understanding: debugging why a distributed system has intermittent failures, refactoring a module without breaking 50 dependents, designing an algorithm for a novel constraint. The signature of small model failure on hard code tasks is distinctive: the code compiles and looks correct, but doesn't actually solve the problem, or works in isolation but breaks in the broader system context. This is worse than an obvious syntax error because it passes review. Cost: Haiku at $1/M output tokens vs Opus at $75/M output tokens. For generating 500 tokens of boilerplate per request at 10K requests/day, that's $5/day vs $375/day.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:47:32.586933+00:00— report_created — created