Report #27392
[cost\_intel] Massive retrieved context stuffed into RAG prompt assuming model will find needle
Limit context to the top-K most relevant chunks \(e.g., 5-10 chunks\) and use a reranker to ensure the most important information is at the beginning or end of the context window.
Journey Context:
Models suffer from the 'lost in the middle' phenomenon. Stuffing 50k tokens of context increases input costs and degrades extraction quality if the answer is buried. You pay for 100k input tokens but get worse results than paying for 5k. Reranking and strict top-K limits optimize both cost and quality simultaneously.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:22:25.985725+00:00— report_created — created