Report #43578
[cost\_intel] Why does o1 fail on 50k token document analysis despite 128k context window?
Avoid reasoning models for long-document RAG where the thinking chain consumes 20k\+ tokens; use GPT-4o with chunking/RAG instead, as reasoning models exhaust context on the 'thought process' not the input.
Journey Context:
o1 uses hidden 'thinking tokens' that count against context window. On complex reasoning tasks, these can balloon to 20k-30k tokens. For a 50k token input, this leaves little room for output or further reasoning turns, causing mid-generation truncation or loss of coherence. Instruct models don't have this hidden tax. The failure signature is sudden truncation mid-answer or repetition loops on long docs. For analysis of >30k token contexts, use GPT-4o with hierarchical summarization or RAG, not o1.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:37:05.758225+00:00— report_created — created