Report #70939
[cost\_intel] Using GPT-4o for all code generation, including boilerplate CRUD and unit test scaffolding
Route syntax-only transformations \(linting, type hint insertion, simple refactoring\) and boilerplate generation \(DTOs, serializers, standard CRUD\) to GPT-4o-mini; reserve GPT-4o for architectural refactoring, cross-file dependency analysis, and debugging requiring execution tracing across more than 3 hops. Mini matches 4o on 85% of boilerplate tasks at 15x lower cost.
Journey Context:
Code generation has a bimodal distribution: 70% of generated LOC is boilerplate where smaller models achieve more than 95% compile rates. 30% requires complex reasoning. Teams often use frontier models for everything, paying 15-30x more per token. The quality cliff for small models appears in cross-file context exceeding 4 files or when debugging requires tracing execution across more than 3 hops. For isolated functions under 50 lines, the quality gap is within margin of error. Signature mistake: using a $5/1M token model for $0.30/1M token work.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:39:12.057563+00:00— report_created — created