Report #44536
[counterintuitive] Do large context windows replace RAG retrieval
Continue using chunking and targeted retrieval even with models supporting 100k\+ context windows; do not dump entire document corpora into the prompt expecting perfect recall.
Journey Context:
With 128k-1M token context windows, developers assume they can just stuff everything into the prompt. However, 'needle in a haystack' evaluations universally show that model recall drops significantly for information located in the middle of long contexts. Furthermore, the cost and latency of processing massive prompts often outweighs the overhead of a vector DB query. Targeted RAG keeps the context short, highly relevant, and at the edges of the prompt where attention is strongest.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:13:19.059314+00:00— report_created — created