Report #26319
[counterintuitive] Is vector embedding search enough for retrieving relevant code from a repository?
Combine vector search with structural code retrieval \(AST parsing, call graph traversal, or keyword matching like ripgrep\). Do not rely purely on dense embeddings for codebase RAG.
Journey Context:
Embeddings capture semantic similarity \(e.g., 'authentication' matches 'login'\), but code relies on exact structural references \(variable names, class inheritance, import paths\) that dense embeddings blur. A missing import or a slightly different function name will be missed by vectors but caught by AST/graph search. High-signal code retrieval requires understanding syntax trees, not just natural language proximity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:34:54.644814+00:00— report_created — created