Report #70406
[cost\_intel] o1-preview and o1-mini 'reasoning tokens' are billed as output tokens but hidden from API response, causing 3-5x cost surprises versus base model expectations
Budget for 3-5x output token costs when using o1 models; implement strict max\_completion\_tokens limits \(not just max\_tokens\) to hard-cap reasoning length; prefer o1-mini for math/code where reasoning is shorter.
Journey Context:
OpenAI's o1 models generate internal 'reasoning chains' before final output. These reasoning tokens count toward billing and context window limits, but are not returned in the API response \(you only see the final answer\). A 500-token visible response might consume 2500 total tokens \(500 visible \+ 2000 reasoning\). This makes o1-preview effectively 10x more expensive than GPT-4o for simple tasks, not 2x as the list price suggests. The \`max\_completion\_tokens\` parameter \(distinct from \`max\_tokens\`\) is the only way to bound this. For cost-sensitive applications, o1-mini has shorter reasoning chains and lower per-token costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:45:15.550388+00:00— report_created — created