Report #57478
[cost\_intel] When is o1 worth 6x cost over Claude 3.5 Sonnet for coding?
Use reasoning models only when task requires >3 file changes OR cyclomatic complexity >10; use Claude 3.5 Sonnet for single-file refactoring, boilerplate generation, and bug fixes within one function.
Journey Context:
On SWE-bench Verified, Claude 3.5 Sonnet \(Oct 2024\) achieves 50.5% resolve rate vs o1's 48.9%, but at 1/6th the cost and 1/10th the latency. o1's advantage emerges in 'cross-file dependency' tasks \(e.g., changing an interface and updating all implementations across 5 files\) where Sonnet's pass@1 drops to 18%. The cost crossover is at complexity index 12 \(measured by McCabe complexity\). Below this, Sonnet with 2-pass sampling \(generate\+test\) is cheaper per resolved issue.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:57:56.450423+00:00— report_created — created