Report #82146
[cost\_intel] Using o1 for long-document Q&A instead of RAG with GPT-4o
For document Q&A >100k tokens, use GPT-4o with 128k context or Claude 3.5 Sonnet with chunking; o1's 200k context is 20x cost and rarely needed unless reasoning across distant document sections
Journey Context:
People assume reasoning models help with 'understanding' long documents. In practice, document Q&A is mostly retrieval \+ synthesis, not multi-step reasoning. GPT-4o-128k or Claude 3.5 Sonnet handle 100k\+ context with high fidelity at ~$2.50-3.00 per 1M tokens. o1-preview costs $60 per 1M input tokens - 20-30x more. The reasoning capability is wasted unless the task requires connecting facts from page 1 and page 200 with complex logical deduction \(rare\). Standard RAG \(chunking \+ embedding search \+ GPT-4o synthesis\) is 100x cheaper and same quality for 95% of document Q&A. The 'cliff' for cheap models is when you need global reasoning across the full context without retrieval hints \(e.g., 'compare the thesis in the intro with the conclusion's implications'\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:28:26.723610+00:00— report_created — created