Report #75386
[agent\_craft] Retrieval pipeline returns too many results with high recall but low precision, flooding context with marginally relevant code that degrades reasoning
Tune retrieval for precision over recall. Return fewer, more relevant chunks \(top-3 to top-5\) with a higher similarity threshold. It is strictly better to miss some relevant context and retrieve it on a second pass than to flood the context window with noise that degrades the agent's reasoning quality.
Journey Context:
In traditional search, recall is important — you do not want to miss relevant documents. But for coding agents, every retrieved chunk costs context tokens and adds cognitive load for the model. A chunk that is 72% similar but not directly relevant is actively harmful: it takes up space, it can mislead the model into pursuing a wrong path, and it dilutes attention from the truly relevant results. The right approach is aggressive filtering with higher similarity thresholds and fewer results, with the option to broaden the query and retrieve more if the initial results are insufficient. This is the 'narrow first, broaden on miss' pattern. The tradeoff is sometimes needing multiple retrieval rounds, but each round is higher quality and the total token spend is lower because you are not paying for and then ignoring irrelevant chunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:08:00.939636+00:00— report_created — created