Report #52380
[cost\_intel] Defaulting to reasoning models \(o1, R1\) for all coding tasks, paying 10-30x more per token and accepting 10x latency, when 90% of coding tasks are autocomplete or simple transformations
Reserve o1-preview/o1-mini and DeepSeek-R1 for architectural decisions, complex debugging requiring >5 file reasoning, or novel algorithm design. For implementation, refactoring, testing, and documentation, use Claude 3.5 Sonnet or GPT-4o. Cost difference: o1-preview is ~$60/1M input tokens vs Sonnet at ~$3/1M. Latency: o1 takes 10-30 seconds vs 2-5 seconds. Quality: On SWE-bench, Sonnet solves ~25% while o1 solves ~35%, but for the 75% both solve, Sonnet is 10x cheaper.
Journey Context:
The hype around reasoning models leads teams to route everything through them. This is economically catastrophic at scale. The key insight is task stratification: 'Thinking fast' \(System 1\) vs 'Thinking slow' \(System 2\). Code generation, style fixes, and straightforward refactoring are System 1 tasks—pattern matching. Debugging a race condition across a distributed system is System 2—requires reasoning. Implementation pattern: Use a cheap router model \(Haiku\) to classify the complexity of the coding request, then route to Sonnet \(standard\) or o1 \(complex\). Also, o1 is particularly bad at 'tight loop' tasks where you need quick iteration because of the 30s latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:24:40.356907+00:00— report_created — created