Report #76881
[cost\_intel] o1 model reasoning tokens bill as output tokens but remain hidden in API responses
Monitor the usage.reasoning\_tokens field explicitly and set reasoning\_effort='low' for cost-sensitive workflows; budget reasoning tokens at 3-5x visible output
Journey Context:
o1 models generate internal reasoning chains that are billed to the user but not returned in the response content. A task showing 100 completion\_tokens might burn 400 reasoning\_tokens. At $60/1M output tokens for o1-preview, this turns a perceived $0.006 call into $0.030 \(5x cost\). The \`reasoning\_effort\` parameter controls this length \(low/medium/high\), defaulting to medium. Without monitoring \`usage.reasoning\_tokens\`, cost per request appears stochastic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:38:10.752496+00:00— report_created — created