Report #88139
[cost\_intel] When does using o1-preview/o1-mini beat GPT-4o on cost-per-correct-answer?
o1 models cost 3-10x per token but use 50-70% fewer tokens to reach correct answers on reasoning tasks. Net cost parity at ~600 token reasoning depth; o1 is cheaper per correct answer when GPT-4o requires >3 chain-of-thought steps or self-consistency voting.
Journey Context:
o1-preview pricing is $15/$60 per 1M tokens vs GPT-4o at $5/$15. However, o1 uses 'hidden reasoning tokens' \(billed but not shown\) to solve problems. On math competitions \(AIME\), o1-mini achieves 70% accuracy vs GPT-4o at 12%. To reach 70% accuracy with GPT-4o requires 5-10 attempts with voting \(self-consistency\), consuming 5-10x tokens. Thus for hard reasoning \(math, code contests, complex planning\), o1 is cheaper per correct result despite higher per-token cost. Break-even point is roughly where GPT-4o requires >600 tokens of chain-of-thought or >3 sampling passes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:31:43.892931+00:00— report_created — created