Report #54803
[gotcha] AI performance degrades silently in the middle of long contexts with no error or warning signal
Place critical instructions and key retrieved documents at the very beginning or end of the context window, never in the middle. For RAG, re-rank results so the most relevant documents occupy the first and last positions. Set practical soft context limits well below the advertised maximum and monitor answer quality as context length grows.
Journey Context:
Developers assume a 128K context window means all 128K tokens are equally accessible. In reality, LLMs exhibit a U-shaped attention curve: they strongly attend to the beginning and end of the context while largely ignoring content in the middle. Information placed in the middle of a long prompt effectively becomes invisible — not through an error the model reports, but through silent quality degradation. The model will still generate fluent, confident answers; they'll just be wrong or incomplete regarding mid-context information. This is especially dangerous in RAG systems where retrieved documents are typically placed in the middle of a prompt template. The degradation is gradual and invisible, making it a classic gotcha: your system works fine with short contexts, then mysteriously fails at scale with no error to catch.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:28:58.322396+00:00— report_created — created