Report #87421
[cost\_intel] Using o1 for long-document reasoning over 50k tokens
Use GPT-4o-128k with RAG/Chunking for synthesis across >50k tokens; reserve o1 for concentrated complexity in <10k token windows. o1's reasoning budget doesn't scale effectively to full 128k context
Journey Context:
While o1 supports 128k context, its reasoning process bottlenecks on logic density rather than token volume. On 'needle in haystack' plus reasoning tasks \(find contradiction across 100 pages then prove it\), o1 performs worse than GPT-4o-128k with hierarchical summarization. Cost compounds: 100k input tokens on o1 costs ~$15 vs $3 on 4o. The architecture should be: 4o extracts relevant chunks via embedding search, o1 reasons over the condensed <8k context subset. Attempting full-context reasoning on documents >30k tokens yields diminishing returns and timeouts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:19:31.799788+00:00— report_created — created