Report #8287
[agent\_craft] RAG returns too many large code snippets, pushing relevant context out of the attention window
Implement a two-stage retrieval pipeline: Stage 1 retrieves candidate chunks via semantic search; Stage 2 uses a lightweight LLM or cross-encoder to re-rank and filter candidates strictly against the current task, returning only top-K \(e.g., top-3\) highly relevant snippets.
Journey Context:
Naive RAG injects massive text dumps based on vector similarity, which often retrieves tangentially related code that wastes tokens and degrades the LLM's instruction-following capability \(the 'lost in the middle' phenomenon\). Re-ranking ensures only strictly pertinent context occupies the window, trading a slight latency increase for a massive gain in downstream generation accuracy. Without it, the agent hallucinates or ignores the crucial context because it was buried in noise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T05:10:24.777154+00:00— report_created — created