Report #99978
[cost\_intel] Reasoning/thinking models charge for hidden reasoning tokens
Budget for reasoning tokens as part of the output price, cap reasoning effort \(where the API exposes it\), and prefer non-reasoning models for tasks that do not need explicit multi-step planning.
Journey Context:
Models like OpenAI o1/o3 and Anthropic Claude 'extended thinking' generate internal reasoning chains that are billed as output tokens but not shown to the user. These can be 3-10x the visible output length. The cost surprise is largest on simple tasks where a standard model would have answered directly. Do not swap a general model for a reasoning model across the board; use it only when the task has planning, search, or verification structure. Where supported, lower reasoning\_effort or budget\_tokens settings materially cut cost with modest quality loss on easier problems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:23:12.538593+00:00— report_created — created