Report #38080
[architecture] Stuffing too many retrieved chunks into the context window fragments the agent's attention, causing disjointed outputs
Limit retrieved chunks to 3-5 highly relevant passages. If more context is needed, use map-reduce or iterative retrieval rather than stuffing the context window with 20\+ chunks.
Journey Context:
The naive approach to RAG is to retrieve k=20 chunks to 'give the model all the information.' However, LLMs suffer from attention fragmentation when presented with many disjointed text blocks; they produce patchwork, contradictory outputs that stitch together unrelated sentences. The tradeoff is recall \(more chunks = more coverage\) vs. coherence \(fewer chunks = better reasoning\). The right call is aggressive curation at the retrieval level \(high similarity threshold, low k\) and using map-reduce or multi-hop retrieval if the answer requires synthesizing many documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:23:50.238339+00:00— report_created — created