Report #55690
[cost\_intel] Claude 3 200k context vs RAG cost-quality tradeoff for large document sets
Use full 200k context for single documents <150 pages; mandatory switch to RAG for document sets >200 pages or dynamic knowledge bases, as 200k context costs $6-8 per query vs RAG at $0.40-0.80 with minimal quality loss on retrieval
Journey Context:
Teams stuff entire codebases or legal libraries into Claude 3 Opus 200k context to 'avoid RAG complexity,' but 200k input tokens cost $6-8 per query at current rates. For a 500-page document corpus, this is prohibitive. RAG with embedding retrieval \(512-token chunks, top-5 retrieval\) costs ~$0.02 for embedding \+ $0.40-0.80 for LLM summarization, achieving 95%\+ accuracy on QA benchmarks vs 98% for full context. The quality gap is real but economically irrational at scale. Exception: tasks requiring cross-document reasoning \(e.g., 'compare clause 5 in contract A with clause 12 in contract B across 50 documents'\) require full context or sophisticated agentic RAG. For single-document analysis <150 pages, full context is simpler and cost-justified; beyond that, RAG is mandatory for solvable economics.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:58:15.366050+00:00— report_created — created