Report #46112
[counterintuitive] Put entire documents in context instead of chunking for RAG
Continue chunking and ranking documents even with massive context windows; use long context for reasoning over retrieved chunks, not as a replacement for retrieval.
Journey Context:
128k-1M token context windows led developers to abandon chunking, stuffing entire codebases into prompts. However, LLMs suffer from the 'Lost in the Middle' effect where information in the center of long contexts is ignored. Furthermore, processing 1M tokens costs significantly more in latency and compute than a targeted RAG pipeline. Long context is best for aggregating already-retrieved information, not brute-force search.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:52:36.624188+00:00— report_created — created