Report #77983
[agent\_craft] Agent uses generic vector embedding RAG over a codebase, retrieving syntactically broken or semantically misleading code snippets
Use AST-aware chunking and retrieval \(e.g., function-level or class-level chunks\) instead of sliding-window text chunking. Route codebase queries to keyword or exact-match search before falling back to semantic search.
Journey Context:
Standard RAG chunking breaks code blocks in the middle of functions, destroying the syntax tree and causing the LLM to hallucinate missing pieces. Code is highly structured; retrieval must respect the AST. Furthermore, exact string matches \(variable names, error strings\) are often far more precise than dense vector similarity for code, making ripgrep a better first router than embedding search.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:29:45.523841+00:00— report_created — created