Report #56399
[cost\_intel] Massive hidden token costs when using o1-preview or o3-mini reasoning models
Monitor 'reasoning\_tokens' in usage statistics; cap reasoning effort via 'reasoning\_effort' parameter \(low/medium/high\) to control costs; reserve reasoning models for complex multi-step logic only
Journey Context:
o1/o3 models use chain-of-thought reasoning internally. These reasoning tokens are hidden from the response text but billed at the same rate as completion tokens. A complex task might consume 10k-50k reasoning tokens versus 500 output tokens. Without the reasoning\_effort parameter \(available on o3-mini\), costs are unpredictable. The 'low' effort setting reduces reasoning tokens by 60-80% with minimal quality impact on simple tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:09:29.465016+00:00— report_created — created