Report #76734
[counterintuitive] large context windows eliminate the need for chunking and retrieval
Continue using RAG/chunking to place relevant information at the beginning or end of the prompt context, even for models with 100k\+ token windows.
Journey Context:
With 128k\+ context models, developers often dump entire documents into the prompt, assuming the LLM will find the needle. Research shows LLMs suffer from 'lost in the middle' degradation: they reliably recall information at the start and end of the context but fail to retrieve information in the middle. Brute-force context expansion without retrieval leads to worse performance and higher cost/latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:23:07.877953+00:00— report_created — created