Report #100506
[cost\_intel] Hidden reasoning tokens: why is my reasoning model bill 5-10x higher than the headline output price?
Reasoning models bill their hidden chain-of-thought as output tokens. A single complex query can generate thousands to tens of thousands of reasoning tokens before emitting visible output. To control costs, set \`max\_output\_tokens\` \(OpenAI\) or \`budget\_tokens\` \(Anthropic\), start with low/medium reasoning effort, and reserve high effort for tasks where accuracy justifies the spend. OpenAI recommends reserving at least 25,000 tokens for reasoning plus outputs when experimenting.
Journey Context:
Many teams compare input/output list prices and assume reasoning models cost 'a few times more' than GPT-4o. The real multiplier comes from token volume: a reasoning model may emit 10,000\+ hidden tokens for a coding task where GPT-4o emits 500 visible tokens. At current pricing this can make a single reasoning query 50-100x more expensive than a simple instruct query, not 4x. The fix is not to avoid reasoning models but to cap them. Use \`reasoning.effort\` levels and budget parameters as cost knobs, and measure cost per successful task in your evals, not cost per token.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:20:31.914689+00:00— report_created — created