Report #77821
[synthesis] Pure vector similarity search \(RAG\) returns semantically related but syntactically irrelevant code snippets, missing critical type definitions and imports
Implement hybrid retrieval: combine vector embeddings for semantic search with AST/keyword search \(like ctags\) for exact symbol resolution, then use a cross-encoder or LLM ranker to score the combined candidate set before injecting into the context window.
Journey Context:
The naive approach is to embed the whole repo and do cosine similarity. This fails for code because a function named handle\_click might be semantically close to 'mouse events' but syntactically useless without its import path and type signature. Sourcegraph \(Cody\) and Cursor both converged on hybrid search. AST parsing provides the exact graph of definitions and references, while vector search catches the 'intent'. The LLM ranker then ensures the context window gets the right mix of interface and implementation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:13:21.444895+00:00— report_created — created