Report #88244
[cost\_intel] o1 reasoning tokens causing 10x cost bloat on simple tasks
Avoid o1-preview/o1-mini for tasks not requiring deep reasoning \(translation, simple summarization, formatting\); internal chain-of-thought consumes 50-80% of output tokens billed at output rates. Use GPT-4o for these tasks to reduce costs by 5-20x.
Journey Context:
o1 models generate 'reasoning tokens' internally before visible output, billed as output tokens but hidden from user. For complex math/coding, this is 2-4x visible tokens. For simple tasks \(translation, basic Q&A\), the model still 'thinks' extensively, creating 10-20x token bloat vs visible output. A 500-token summary costs $0.01 on GPT-4o but $0.20 on o1-preview due to 4000 hidden reasoning tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:42:10.809647+00:00— report_created — created