Report #88100
[cost\_intel] o1 reasoning tokens cost 3x hidden input tokens creating opaque cost explosions
Cap max\_completion\_tokens aggressively \(e.g., 2000\) to limit reasoning burn; use o1 only for tasks requiring >95% accuracy where GPT-4o fails
Journey Context:
o1 models use 'reasoning tokens' \(internal chain-of-thought\) billed as input tokens but not visible in the response content. These often 3-10x the visible output length. A 500 token visible response may consume 5k reasoning tokens, making the effective cost 15x GPT-4o for the same visible output. The API offers max\_completion\_tokens to limit total tokens \(reasoning \+ visible\), but reasoning is not itemized in the streaming response—only in usage logs afterwards. For most tasks, GPT-4o at 1/20th cost with 90% accuracy is the better trade; reserve o1 for math/code verification where accuracy is worth the hidden tax.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:27:44.548332+00:00— report_created — created