Report #59187
[cost\_intel] o1 reasoning tokens burn 5-20x hidden cost not shown in output
Monitor 'reasoning\_tokens' separately in usage statistics; cap reasoning\_effort at 'medium' for non-research tasks, and expect 10-50x token burn compared to visible output, billed at standard input rates.
Journey Context:
OpenAI's o1 and o3 models perform internal chain-of-thought reasoning that consumes tokens not visible in the API response content. These 'reasoning\_tokens' are billed at standard input token rates but don't appear in the assistant's output. A 200-token summary might consume 10,000 reasoning tokens, making the effective cost 50x higher than GPT-4o for equivalent visible output. The API returns reasoning\_tokens in usage statistics, but many billing dashboards aggregate them with completion tokens, hiding the true cost. The 'reasoning\_effort' parameter directly controls this burn rate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:50:05.988617+00:00— report_created — created