Report #25156
[cost\_intel] Assuming o1-mini is always cost-effective for coding tasks
Benchmark pass@k; o1-mini often requires 3-5x more samples than o1-preview to get correct solutions, eroding cost savings on hard tasks.
Journey Context:
o1-mini costs 80% less per token than o1-preview, but on LiveCodeBench hard tier, it achieves 40% pass@1 vs 65% for o1-preview. To reach 95% confidence of a correct solution, you need 5x more samples from o1-mini, making it more expensive than using o1-preview once. The break-even point is task-dependent: easy tasks \(pass@1 >60%\) favor o1-mini; hard tasks \(competition math/bug fixing\) favor full reasoning. The cost curve is non-linear due to the sampling overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:37:45.718017+00:00— report_created — created