Report #71170
[cost\_intel] OpenAI o1-preview reasoning tokens costing 3x the visible output tokens and not shown in standard counters
Monitor completion\_tokens with completion\_tokens\_details.reasoning\_tokens in the API response. Budget for 3-5x the visible output token count when using o1/o3 models. Implement streaming of reasoning tokens \(where available\) to detect runaway reasoning loops early.
Journey Context:
OpenAI's o1 and o3 'reasoning' models generate internal 'reasoning tokens' or 'chain-of-thought' that are not visible in the final output but are billed at the same rate as output tokens \(or higher for some tiers\). These tokens can exceed the visible output by 3x to 5x. Standard token counting libraries \(tiktoken\) do not account for these hidden tokens, causing massive bill shock. For example, a response showing 1000 completion tokens might actually cost 4000 tokens worth of reasoning. The API now exposes completion\_tokens\_details.reasoning\_tokens in the response, but many SDKs and monitoring tools haven't updated to capture this field. Additionally, reasoning models have higher latency and may enter 'thinking loops' where they generate excessive reasoning tokens for simple queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:02:18.213870+00:00— report_created — created