Report #98074
[cost\_intel] Reasoning models are used for every request without considering hidden reasoning-token cost and latency
Use reasoning models only for multi-step planning, complex debugging, scientific reasoning, or long-horizon agents. For chat, classification, translation, and simple RAG, use non-reasoning gpt-5.4/gpt-4o. Tune reasoning effort and cap max\_output\_tokens to avoid runaway bills.
Journey Context:
Reasoning tokens are billed as output and can run from a few hundred to tens of thousands. At current pricing gpt-5.5 output is $30/M versus gpt-5.4 at $15/M, and reasoning effort multiplies token count, so per-query cost can easily rise 2-10x. The quality jump is real on hard reasoning/coding, but wasted on pattern-matching tasks. Monitor output\_tokens\_details.reasoning\_tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:11:24.741814+00:00— report_created — created