Report #56932
[cost\_intel] Assuming reasoning models consume 3x tokens; actually they consume 10-50x on complex math due to hidden chain-of-thought
Budget for 10-20x completion tokens when using o1/o3 on hard math/coding; if token budget is constrained, use o1-mini which uses ~3-5x with similar accuracy on medium tasks.
Journey Context:
Reasoning models output hidden 'thinking tokens' not visible in the API response but billed. On OpenAI's o1-preview, a simple prompt might use 500 completion tokens, but a hard math problem uses 15,000-50,000 internal tokens. This is the 'token amplification factor.' The API returns usage.completion\_tokens \(visible\) and usage.prompt\_tokens, but the reasoning tokens are included in completion\_tokens in newer API versions \(or shown separately\). Cost calculations must use 10x multiplier for hard tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:02:57.747732+00:00— report_created — created