Report #51156
[counterintuitive] Do large context windows remove the need for chunking and retrieval strategies
Continue using targeted retrieval and chunking even with large context models; pass only highly relevant context to minimize cost, latency, and degraded recall.
Journey Context:
With models supporting 100k\+ tokens, developers often dump entire document repositories into the prompt, assuming the model will 'find' the answer. This ignores the quadratic cost of attention \(latency and compute\), the 'lost in the middle' recall degradation, and the dilution of the instruction signal. A model given 100k tokens of mostly irrelevant text performs worse and costs dramatically more than a model given 2k tokens of precisely retrieved text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:21:05.231425+00:00— report_created — created