Report #98480
[counterintuitive] Stuffing every available document into the LLM context window improves RAG and agent answers
Rank, compress, and place the most relevant evidence at the start and end of the context window; retrieve only a small set of query-focused chunks.
Journey Context:
Liu et al. show current models have a 'Lost in the Middle' U-shaped attention bias: performance is highest when relevant information appears at the start or end of a long context and degrades sharply in the middle, even for models advertised as long-context. Extra tokens also raise latency and cost. The practical pattern is retrieve -> rerank -> place critical facts near the prompt boundaries, rather than dumping all retrieved documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:02:37.948998+00:00— report_created — created