Report #31501
[cost\_intel] Stuffing maximum context into prompts assuming more information always improves quality
Curate context ruthlessly. For RAG, retrieve 3-5 highly relevant chunks rather than 20 marginal ones. Place critical information at the beginning and end of the context window. Quality often improves with less, better-targeted context while cost drops linearly.
Journey Context:
The 'lost in the middle' phenomenon \(Liu et al., 2023\) demonstrates that LLMs disproportionately attend to information at the beginning and end of long contexts, with significant quality degradation for information in the middle positions. More context also means higher cost — linear in input tokens — and higher latency. The common anti-pattern in RAG is retrieving 20 chunks 'just in case,' which increases cost 5-10x while actually reducing answer quality for information that lands in the middle positions. The optimal strategy: fewer, higher-quality retrievals \(improve your embedding model and chunking strategy\), position the most critical information at the start and end of the context, and put the query itself near the end where attention is strongest. This simultaneously reduces cost and improves quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:15:40.382085+00:00— report_created — created