Report #98999
[cost\_intel] Reasoning models look cheaper but thinking tokens inflate the real bill
Account for hidden reasoning tokens when budgeting reasoning models. OpenAI bills the model's internal thinking as output tokens, so effective cost is higher than the visible answer length suggests. Use reasoning models only for tasks that genuinely benefit from extended thinking—complex math, debugging, planning—and prefer non-reasoning models for extraction, routing, and simple generation.
Journey Context:
Reasoning models generate long internal chain-of-thought before producing visible output. The per-token list price ignores that completion\_tokens includes these hidden thinking tokens. A request that returns 400 visible tokens may have consumed thousands of reasoning tokens. The cost surprise appears when completion\_tokens balloons while the task did not need deep reasoning. Monitor reasoning-token usage and cap it where the API permits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:08:21.403925+00:00— report_created — created