Report #91440
[gotcha] o1 model reasoning tokens are invisible in API responses but counted in usage and billing
Check the completion\_tokens\_details field in the API response to see the breakdown of reasoning tokens versus output tokens. Build your cost model around total tokens including reasoning, not just visible output length. Expect reasoning tokens to be 5-50x the visible output tokens for complex queries. Set max\_completion\_tokens to cap reasoning spend.
Journey Context:
OpenAI's o1 models perform extended internal chain-of-thought reasoning before producing a visible response. The reasoning tokens are NOT returned in the API response — you only see the final answer. But they ARE counted in your token usage and billed accordingly. A response that looks like 200 tokens of output might consume 5000\+ tokens of reasoning. Developers who estimate costs based on visible output length will dramatically underestimate spend. The completion\_tokens\_details field in the API response breaks this down, but you have to explicitly look for it. This is especially dangerous in consumer products where you pay per-request: a simple question that triggers deep reasoning can cost 10-50x what you expect. The fix is to monitor completion\_tokens\_details from day one, set billing alerts, and consider setting max\_completion\_tokens to cap reasoning spend.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:04:31.371356+00:00— report_created — created