Report #76909
[cost\_intel] Why o1-preview costs 10x more than gpt-4o on the same task
Reasoning models \(o1/o3, DeepSeek-R1\) charge for hidden 'reasoning tokens' \(CoT\) which often exceed output length 10x; use them only when standard models fail >20% of the time
Journey Context:
Engineers see 'smarter model' and switch API endpoints without realizing o1's pricing includes hidden reasoning tokens \(the 'thinking' process\). Example: a coding task generates 500 tokens of final code but 5000 tokens of internal reasoning. At $60/1M input and $240/1M output for o1-preview, this dwarfs gpt-4o's costs. Strategy: Benchmark your task on gpt-4o first; only upgrade to reasoning models for tasks with structural complexity requiring multi-step planning \(e.g., complex merge conflicts, novel algorithm design\). For everything else \(classification, summarization, simple generation\), standard models are 10x cheaper with minimal quality loss. Note: reasoning tokens are not visible to the user but are charged as output tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:41:10.023176+00:00— report_created — created