Report #45947
[cost\_intel] Underestimating long context requirements for book-length RAG
Use Gemini 1.5 Pro \(2M context\) for documents >100k tokens instead of Claude 3.5 Sonnet \(200k limit\) to avoid chunking complexity; while 3x cost per token \($3.50 vs $3/1M\), it eliminates 'needle-in-haystack' recall failures \(99% vs 70% at 200k\+\) that force expensive reruns.
Journey Context:
Teams processing legal briefs or novels chunk them into 100k token pieces for Claude, losing cross-chapter context and suffering 30%\+ recall degradation on distant references. Gemini 1.5 Pro's 2M context window processes the entire document, maintaining >99% needle recall at 1M tokens. The upfront cost is higher, but prevents the 'chunk and rerank' pipeline costs \(multiple API calls \+ embedding costs\) and accuracy loss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:35:47.334404+00:00— report_created — created