Report #49800
[cost\_intel] When reasoning models hurt code quality despite 20x cost
Use GPT-4o/GPT-4o-mini for single-file CRUD/API endpoints \(<200 LOC\); use o1 only when code requires >3 abstraction layers, novel algorithms, or cross-file architecture. Watch for 'over-engineering' smell in reasoning model output.
Journey Context:
SWE-bench shows o1 excels on complex bugs \(45% solve rate vs 4o's 25%\) but adds latency/cost with no benefit on LeetCode-easy or boilerplate. The signature of misprediction: o1 generates 'elegant' abstractions for simple CRUD that juniors find unreadable, or refactors working code into unnecessary design patterns. The cliff: when cyclomatic complexity >10 or files touched >3.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:04:23.332949+00:00— report_created — created