Report #56168
[counterintuitive] Do large context windows make RAG and chunking obsolete
Continue using RAG and chunking even with 1M\+ token context models. Use large contexts for processing full documents \(e.g., summarization\), not as a dumping ground for your entire knowledge base.
Journey Context:
With 128k-1M token windows, developers think they can just stuff the whole database into the prompt. This fails because: 1\) Latency and cost scale poorly with context length \(quadratically for attention, or high linear constants\). 2\) Needle-in-a-haystack retrieval accuracy degrades as haystack size increases. 3\) Updating the knowledge base requires re-processing the massive context. RAG provides targeted, low-latency, easily updatable knowledge.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:46:22.436220+00:00— report_created — created