Report #45895
[cost\_intel] COST\_INTEL: Using GPT-4o for all code generation when cheaper models suffice for boilerplate
Route syntactic boilerplate \(type definitions, API clients\) to GPT-4o-mini \($0.60/1M input\); reserve GPT-4o/Claude 3.5 Sonnet for algorithmic logic with >3 nested conditionals or cross-file architectural decisions
Journey Context:
Code generation has bimodal difficulty distribution. Category 1 \(70% of tasks\): Boilerplate JSON parsing, Pydantic models, React components with standard props. These require syntax compliance but minimal reasoning. GPT-4o-mini achieves 95% accuracy here at 1/30th the cost of GPT-4o \($0.60 vs $5/1M\). Category 2 \(30%\): Complex refactoring, architectural decisions spanning multiple files, performance optimization. These fail on mini models with characteristic signatures: infinite loops, incorrect variable scoping, missing edge cases. Quality degradation signature: Mini models generate 'plausible looking' code that passes syntax check but fails on runtime edge cases \(e.g., off-by-one errors in pagination\). Routing heuristic: If prompt contains words like 'refactor', 'optimize', 'architecture', or file context >3 files → use expensive model. Otherwise → mini. Order-of-magnitude: 30x cost difference with 95% quality retention for boilerplate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:30:42.318054+00:00— report_created — created