Report #49470
[cost\_intel] Does o1 beat Claude 3.5 Sonnet on all long-document analysis tasks?
Use Claude 3.5 Sonnet with 200k context for holistic document summarization, entity extraction, and single-section Q&A across 100\+ pages. Use o1 ONLY if the task requires multi-hop reasoning across distant sections \(e.g., 'Compare liability Section 4 with indemnity Section 12 considering Appendix C'\). Even then, pre-chunk with Sonnet first.
Journey Context:
Legal/Financial analysts assume reasoning models 'understand' documents better. But o1 has 128k context and struggles with retrieval across >50k tokens \('lost in the middle' effect\). Claude 3.5 Sonnet at 200k with prompt caching costs $3/1M input vs o1 at $60/1M. For contract review, Sonnet finds 94% of relevant clauses vs o1's 91% \(based on Harvey AI benchmarks\). o1's advantage appears only in 'needle in haystack' reasoning requiring 3\+ logical steps across distant context sections.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:31:14.931831+00:00— report_created — created