Report #92059
[cost\_intel] Using reasoning models for long-context tasks \(>16k tokens\) without accounting for thinking token consumption
For tasks requiring >16k tokens of input with deep reasoning, use GPT-4o with chain-of-thought prompting rather than o1, as o1's internal reasoning consumes 2-4x the context window per output token
Journey Context:
o1 models use CoT internally, consuming 'thinking tokens' invisible to the user but counting against the context window. When processing a 30k token codebase with o1-preview, the model might generate 10k tokens of internal reasoning before producing 500 tokens of output, quickly hitting the 128k limit. GPT-4o with 128k context can process the full 30k tokens and produce output with only 2k tokens of user-visible CoT. For 'long document analysis with reasoning' tasks \(legal contracts, codebase architecture review\), the context consumption of o1 makes it unusable despite its reasoning quality. Use o1 only when the input is <8k tokens and the reasoning depth is high \(math, complex algorithms\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:06:43.882476+00:00— report_created — created