Report #51104
[cost\_intel] Hidden Thinking Token Budgets: The Short-Answer Tax
Budget for 2-5x output tokens in 'thinking' costs. For o1/o3, if you expect a 500-token answer, assume 1500-2500 thinking tokens will be charged. If your use case requires <500 total tokens of reasoning, avoid reasoning models entirely.
Journey Context:
Reasoning models charge for internal chain-of-thought tokens \(hidden from user but billed\). OpenAI's o1 pricing shows reasoning tokens are charged at the same rate as output tokens. Empirical measurements show thinking tokens often exceed output tokens 3:1. This makes 'short answer' tasks disproportionately expensive. A classification task that costs $0.001 with GPT-4o can cost $0.03 with o1 \(30x\). The signature is high cost despite short visible output. Fix: Instrument token usage. If thinking tokens > 2x output tokens and task is simple, downgrade.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:15:54.726454+00:00— report_created — created