Report #65990
[cost\_intel] Full long-context vs RAG retrieval quality-cost miscalculation
For synthesis across >5 disparate sections of a 100k\+ document, use Claude 3.5 Sonnet 128k context \($3/1M input\); for single-fact lookup, use RAG with Haiku \(10x cheaper\). RAG fails at cross-chunk reasoning \('compare arguments in sections 2, 5, and 8'\), while full context avoids retrieval error but costs 50x more per query.
Journey Context:
Teams default to RAG for all long documents to 'reduce tokens,' but for complex synthesis tasks, retrieval fails to fetch all relevant chunks, causing silent quality degradation. Conversely, using full context for simple Q&A burns budget unnecessarily. The error is treating document length as the sole variable; the critical variable is 'number of disparate sections that must be jointly reasoned over.' When >3, pay for full context; when 1-2, use RAG. This hybrid approach maintains 95% quality at 20% of full-context cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:14:32.883802+00:00— report_created — created