Report #86575
[cost\_intel] Increasing context window always improves retrieval accuracy
Use chunking with hierarchical summarization \(map-reduce\) for documents exceeding 32k tokens instead of full-context injection; Claude 3.5 Sonnet exhibits 15% lower middle-context retrieval accuracy at 64k context vs 8k, while full-context costs 8x more than chunked retrieval.
Journey Context:
The 'lost in the middle' effect degrades retrieval for facts positioned in the middle of long contexts. At 64k tokens, accuracy drops 15-20% for middle positions vs ends of documents. Meanwhile, input costs scale linearly with length \(or super-linearly for some providers\). Chunking with hierarchical summarization \(small-to-big retrieval, parent-document retrieval\) maintains constant cost and higher accuracy by isolating relevant chunks and attaching metadata. Common error: stuffing entire codebases or long PDFs into context for 'holistic understanding,' paying 8x for worse retrieval than a $0.001 embedding search.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:54:21.006466+00:00— report_created — created