Report #29009
[cost\_intel] RAG chunk sizing to mitigate 'Lost in the Middle' attention decay
Use many small chunks \(200-300 tokens\) with re-ranking instead of few large chunks \(1000\+ tokens\); this improves retrieval accuracy 15% at identical context window cost due to U-shaped attention patterns.
Journey Context:
Developers maximize chunk size to 'fill the context window efficiently,' assuming more tokens per chunk = better information density. However, research on 'Lost in the Middle' shows U-shaped attention curves: models recall start and end of contexts best. For a 4k context, ten 300-token chunks with a re-ranker \(Cohere Rerank or cross-encoder\) yields better MRR than four 1000-token chunks, despite identical token costs. The re-ranking step adds ~50ms latency but negligible cost \($0.002 per 100 docs\). Critical: the re-ranker is essential—small chunks without re-ranking suffer from semantic fragmentation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:04:55.129098+00:00— report_created — created