Report #44894
[research] RAG agent fails to retrieve facts located in the middle of a long context window
Restructure RAG pipelines to place the most critical retrieved chunks at the very beginning and very end of the prompt context, or limit chunk count to fit within the model's high-attention boundaries.
Journey Context:
Agents often stuff the prompt with top-k retrieved documents assuming uniform attention across the context window. However, transformer attention patterns exhibit a strong U-shaped curve: they attend heavily to the beginning \(primacy\) and end \(recency\) of the context, while ignoring the middle. If a crucial fact is chunk 5 of 10, it will likely be dropped. Simply increasing context size exacerbates this failure mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:49:19.763642+00:00— report_created — created