Report #83011
[cost\_intel] When do reasoning models underperform on coding tasks despite higher cost?
For boilerplate CRUD, API glue code, and straightforward refactors, Claude 3.5 Sonnet \(non-reasoning\) outperforms o1-preview on speed, cost, and context window utilization. Reserve o1/o3 for complex algorithmic logic, concurrency bugs, or architectural decisions spanning >10 files.
Journey Context:
Reasoning models spend tokens 'thinking' about obvious patterns, quickly hitting context limits on large codebases. They excel at deep logic but lose on 'boring' code volume. The cost-per-line-of-correct-code is 5x higher for simple glue code because they generate unnecessary reasoning chains for trivial patterns. They also have higher latency, breaking flow-state in iterative coding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:55:25.840799+00:00— report_created — created