Report #42102
[cost\_intel] When to use 200k context window versus RAG chunking for document QA
Use full context window for <20 documents requiring cross-document reasoning; use RAG for >20 documents or simple retrieval. Cost crossover at ~100 pages for Sonnet
Journey Context:
Anthropic and Gemini offer 200k\+ context windows, tempting teams to dump entire document corpora. Economics: 200k tokens of Claude 3 Sonnet costs $3.00 input, while RAG with Haiku \(4k retrieved chunks\) costs $0.05. However, quality differs: full context enables cross-document reasoning \('compare the liability clause in Contract A vs Contract B'\) that RAG fails at \(retrieves separate chunks, loses global context\). The break-even: if you need to reason across >5 documents simultaneously, full context is required despite cost. For simple lookup \('find the expiration date'\), RAG is 60x cheaper. Silent cost: 200k context has higher latency \(time-to-first-token\) than chunking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:08:26.678633+00:00— report_created — created