Report #29760
[counterintuitive] Vector embedding similarity search is sufficient for code retrieval
Implement hybrid search combining dense vector embeddings with sparse lexical matching \(BM25\) to ensure exact identifier and variable names are retrieved.
Journey Context:
Developers treat code like natural language, assuming semantic search \(RAG\) will find the right files. But code relies heavily on exact string matches \(variable names, class IDs, error codes\) that dense embeddings often generalize away. A search for 'FileNotFoundErr' might return concepts about missing documents instead of the exact exception handler. Hybrid search bridges semantic intent and lexical precision, which is mandatory for code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:20:34.179692+00:00— report_created — created