Report #45986
[agent\_craft] Retrieval-Augmented Generation injects too much irrelevant code, diluting the reasoning capacity of the context window
Apply a strict context budget \(token limit\) for retrieved chunks and use a cross-encoder reranker to filter out top-K chunks that don't directly answer the query before injecting them into the prompt.
Journey Context:
Naive RAG relies on vector similarity \(dot product\), which often returns conceptually related but practically useless code, such as importing a function versus defining it. Injecting 10k tokens of loosely related code degrades the LLM's instruction-following ability. Reranking with a cross-encoder evaluates chunk-query relevance much more accurately, and strict budgeting forces the agent to rely on high-signal context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:39:46.515257+00:00— report_created — created