Report #90198
[cost\_intel] Hidden cost multiplier in reasoning models beyond per-token pricing
Budget for 3-5x higher total tokens \(input \+ reasoning \+ output\) when using o1 vs GPT-4o due to internal reasoning chains; a 1k input/500 output task becomes 8k total tokens.
Journey Context:
Reasoning models generate hidden 'thinking' tokens not exposed in the API output but billed as part of the 'reasoning\_tokens' field. In practice, o1 uses 3-5x more total tokens than the visible input\+output would suggest. For example, a coding task with 2k input tokens and 1k output tokens incurs ~6k reasoning tokens, making the actual cost 9x the naive calculation, not just the 30x base rate difference. This matters for budget forecasting; teams often estimate 30x and get surprised by 100x bills. Monitor the 'usage.reasoning\_tokens' field in API responses to track this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:59:36.883283+00:00— report_created — created