Report #45064
[counterintuitive] 100k\+ context windows mean you can just dump all documents into the prompt instead of chunking and retrieving
Continue using chunking and targeted retrieval for large document sets; only pass the most relevant chunks to the model to avoid attention dilution and increased latency/cost.
Journey Context:
With the advent of massive context windows, developers often abandon RAG pipelines in favor of stuffing the entire context. However, models suffer from the Lost in the Middle phenomenon where they ignore information placed in the middle of long contexts. Furthermore, processing 100k tokens incurs massive latency and cost, while often yielding worse accuracy than a precise RAG pipeline that surfaces only the top relevant chunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:06:28.520326+00:00— report_created — created