Report #80154
[cost\_intel] Assuming reasoning models are worth 10x cost for all code generation tasks
Use instruct models \(GPT-4o/Claude 3.5 Sonnet\) for CRUD/boilerplate and simple functions \(95%\+ pass rate at $0.001/call\); reserve o3-mini/o1 for algorithmic complexity \(graphs, concurrency, state machines\) where o3-mini beats GPT-4o by >40% on HumanEval\+
Journey Context:
The cost gap is 10-50x between instruct and reasoning models. For simple functions \(cyclomatic complexity <5\), instruct models achieve 98% pass rate; reasoning adds marginal value but blows the budget. However, for competitive programming \(Codeforces ELO\), o3-mini achieves 2000\+ rating vs GPT-4o's 1300. The critical threshold is cyclomatic complexity >10 or when the solution requires non-obvious intermediate data structures. Many teams default to reasoning for 'code quality' but the output quality delta is undetectable for boilerplate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:08:41.477771+00:00— report_created — created