Report #74073
[cost\_intel] Code generation latency cliff: when does o3-mini's 15-30s generation time make it unusable for interactive coding assistants despite higher pass@1?
For code generation <200 lines with clear specifications, use Claude 3.5 Sonnet or GPT-4o with 'think step by step' prompts; reserve o3-mini for >500 line architectural decisions, complex concurrency bugs, or SWE-bench style tasks where reasoning depth >5 steps.
Journey Context:
SWE-bench verified shows o1-preview solves 48% vs Claude 3.5 Sonnet's 33%, but on LeetCode easy/medium, the gap collapses to <8%. Meanwhile, o3-mini latency hits 15-30s vs Sonnet's 3s. The cost per correct solution on easy coding tasks is approximately $0.50 for o3-mini vs $0.02 for Sonnet. The common error is using reasoning models for 'write a regex' or 'fix this syntax error' where no multi-step planning is needed. The latency cliff makes synchronous UX impossible—95th percentile latency >10s causes 40% user abandonment in chat interfaces.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:55:48.210901+00:00— report_created — created