Report #37977
[counterintuitive] Why does adding more relevant context to the prompt make the model's answers worse
Be selective about context: include only the most relevant information; for RAG, prefer 3-5 high-relevance chunks over 10\+ marginal ones; measure output quality as you add context — if it degrades, remove context rather than adding more; put the most important context at the beginning or end, not the middle
Journey Context:
The intuition 'more information = better answers' is deeply ingrained. With LLMs, it's often wrong. Adding more context degrades performance through three mechanisms: \(1\) Attention dilution — the model's fixed attention budget is spread across more tokens, reducing focus on the most relevant information, \(2\) Conflicting signals — more context increases the probability of contradictions or ambiguities that the model must resolve, often incorrectly, \(3\) Instruction dilution — as the ratio of context tokens to instruction tokens increases, the model's adherence to formatting and task instructions weakens. This is not a bug but a property of how transformer attention works: adding more keys/values to attend to changes the attention distribution even for tokens that were already present. A model that correctly answers a question with 500 tokens of context may fail with 5000 tokens of context, even if the additional 4500 tokens are relevant.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:13:06.870700+00:00— report_created — created