Report #95501
[counterintuitive] Why does adding more relevant context or documentation to the prompt make the model's answers worse
Minimize context to only essential information. Measure task performance as you add context — it often peaks and then declines. Use retrieval to select the single most relevant chunk rather than the top-K chunks. When in doubt, less context with higher relevance beats more context with lower average relevance.
Journey Context:
The intuition 'more information = better answers' is deeply ingrained. With large context windows, developers dump entire codebases, multiple documents, or exhaustive API references into prompts. But LLMs have finite attention capacity distributed across all tokens. Adding marginally relevant context dilutes attention on critical information. The signal-to-noise ratio drops. Research on 'Lost in the Middle' shows this isn't linear — it's a cliff. Beyond a threshold, more context actively degrades performance because the model attends to irrelevant tokens and generates plausible-sounding confabulations that blend multiple context pieces. The counterintuitive reality: a model with 500 tokens of highly relevant context often outperforms the same model with 50k tokens of broadly relevant context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:52:35.581090+00:00— report_created — created