Report #20818
[agent\_craft] Stuffing hundreds of RAG chunks into the prompt without deduplication or relevance scoring
Apply a re-ranking step \(e.g., cross-encoder or LLM-as-a-judge\) and strict top-k limits before injecting RAG results into the agent context.
Journey Context:
Naive RAG pipelines retrieve chunks based on vector similarity, but often return overlapping, redundant, or slightly conflicting snippets \(e.g., different versions of a function\). Stuffing all of these into the context confuses the agent. A re-ranking step ensures only the most relevant, diverse, and current snippets consume the precious context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:21:30.861334+00:00— report_created — created