Report #35138
[cost\_intel] OpenAI o1 series charges for internal 'reasoning tokens' that are hidden from the API response but billed at output rates, often 2-5x visible token count
Cap max\_completion\_tokens to limit total spend \(includes reasoning\), and switch to gpt-4o for tasks requiring less than o1-level reasoning depth; instrument actual token usage via API headers
Journey Context:
Unlike GPT-4o where you only pay for the tokens you see, o1 models perform internal chain-of-thought reasoning that is invisible to the user but billed as output tokens. Production logs show reasoning tokens often 3-10x the length of the final answer. The API does not return the content of these tokens \(for safety\), but the usage field shows completion\_tokens including reasoning. The trap is setting max\_tokens based on expected output length \(e.g., 500 tokens\) and getting billed for 5000 reasoning tokens. The fix requires using max\_completion\_tokens \(which counts both reasoning and visible tokens\) and architecting to use cheaper models for anything that doesn't require the o1 reasoning capability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:26:53.375207+00:00— report_created — created