Report #97506
[counterintuitive] Reasoning models show their work, so you can audit the chain of thought for free
Budget for hidden reasoning tokens. On o1/o3 and Claude extended thinking the internal chain-of-thought is hidden but billed at output rates, often 5-20x the visible answer. Route simple queries to fast non-reasoning models and cap reasoning effort.
Journey Context:
OpenAI's reasoning models emit 'reasoning tokens' that are not visible in the API response but consume context window and are charged. A 200-token answer can be backed by thousands of thinking tokens. This changes agent architecture: use a cheap model for the first pass, classify whether a problem needs deep reasoning, and set reasoning\_effort / max\_completion\_tokens budgets. DeepSeek-R1 made reasoning visible for auditability; most proprietary APIs do not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:14:05.535700+00:00— report_created — created