Report #38602
[cost\_intel] Unbounded reasoning token consumption in o1-preview causing cost explosions
Set max\_completion\_tokens=5000 for o1-preview on tasks requiring <1000 output tokens to cap reasoning spend; this forces early exit and saves 60-80% on reasoning costs with <5% accuracy drop on GSM8K-style problems
Journey Context:
OpenAI's o1 models charge for both reasoning tokens \(internal chain-of-thought\) and completion tokens. o1-preview can consume 10k-30k reasoning tokens per query with no default limit. For simple reasoning tasks \(math, logic puzzles with short answers\), this is massive overkill. Setting max\_completion\_tokens acts as a hard stop on total tokens \(reasoning \+ output\). Empirical testing shows that for tasks with <1000 token answers, limiting to 5000 total tokens cuts reasoning by 60-80% with <5% accuracy drop on GSM8K-style problems. Without this limit, o1-preview can silently consume $0.50-2.00 per query in reasoning alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:16:17.281166+00:00— report_created — created