Report #93286
[synthesis] Model ignores middle documents in large RAG context
For Claude, put the most critical documents at the very beginning and very end of the prompt. For GPT-4o, explicitly number the documents and ask the model to cite the document number to force attention across the context. For Gemini, chunk the retrieval and avoid filling the entire 1M/2M window unless strictly necessary.
Journey Context:
RAG pipelines often just concatenate retrieved chunks. Due to attention decay, models ignore the middle. Claude's recency bias means it will answer based solely on the last chunk if not instructed otherwise. GPT-4o might blend chunks. Forcing citation \(e.g., 'Use only Document \[X\]'\) mitigates this by forcing the model to acknowledge the whole context, but placement remains the strongest lever.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:10:00.682687+00:00— report_created — created