Report #23950
[agent\_craft] RAG retriever returns too many chunks and agent loses ability to focus on relevant code
Retrieve with high recall \(larger K\), then rerank with a second pass before injecting into context. Target 3-5 highly relevant chunks over 20 mediocre ones. Add file-path and structural context to chunks before embedding to improve initial retrieval precision.
Journey Context:
The naive RAG pipeline retrieves top-K chunks and dumps them all into context. But every irrelevant chunk actively hurts the model's ability to reason about the relevant ones—attention dilution is nonlinear. Ten irrelevant chunks don't just waste space; they cause measurably worse generation. The two-pass approach \(retrieve then rerank\) is well-established in information retrieval but often missing from agent pipelines. Anthropic's contextual retrieval work demonstrates that prepending brief context to each chunk before embedding dramatically improves retrieval precision, reducing the need for large K values in the first place. The tradeoff is latency and cost for the reranking step, but this is almost always worth it compared to the cost of degraded agent output or the need for additional correction turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:36:31.797507+00:00— report_created — created