Report #6471
[agent\_craft] RAG pipeline injects too much irrelevant code into the agent's context window
Implement a two-stage retrieval: first, a broad search to find candidate files; second, an AST parser to extract only the specific functions or classes relevant to the current sub-task, injecting only those snippets.
Journey Context:
Naive RAG for coding agents often pulls in entire files based on embedding similarity. This fills the context with boilerplate and irrelevant functions, causing the agent to hallucinate or lose focus. By adding a precise extraction step via tree-sitter after the initial retrieval, you keep the context strictly limited to the necessary interfaces, reducing token cost and improving instruction following.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T00:12:20.397391+00:00— report_created — created