Report #99505
[cost\_intel] Reasoning models bill hidden chain-of-thought tokens that can exceed visible output by 5-20x
Cap max\_completion\_tokens tightly and route to reasoning models only for tasks that genuinely need multi-step planning; use cheaper models for straightforward classification or summarization.
Journey Context:
OpenAI's reasoning models generate internal reasoning tokens that count toward billing and context limits but are not returned in the API response. A request that returns 500 tokens of final answer may have consumed 10k tokens of reasoning. The model selection matters: do not default to o1/o3 for every task. Reserve them for code review, math, or complex planning where the quality gain is worth the cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:15:19.094311+00:00— report_created — created