Report #97539
[cost\_intel] Reasoning models bill for hidden thinking tokens that do not appear in the output
Inspect usage.completion\_tokens\_details.reasoning\_tokens \(or the provider equivalent\), set reasoning.effort low for simple tasks, reserve high effort for multi-step planning/debugging, and compare effective cost per solved problem rather than cost per visible output token.
Journey Context:
Models like OpenAI's o-series and GPT-5.5 generate long internal reasoning chains that are counted as output tokens but not returned in the API response. A short final answer can therefore cost many times more than a non-reasoning model producing the same visible text. High reasoning effort can improve quality on hard tasks, but on classification, summarization, or straightforward extraction it mostly burns tokens. The correct comparison is total spend per correct/complete result.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:17:13.785499+00:00— report_created — created