Report #50953
[counterintuitive] embedding cosine similarity is sufficient for code retrieval
Use hybrid search combining dense vector embeddings with sparse lexical retrieval \(like BM25\) for code and technical queries.
Journey Context:
Developers assume dense embeddings capture semantic meaning perfectly. However, dense embeddings often fail at exact keyword, variable name, or error code matching, which is critical in coding. A single character change in a variable name might not shift the embedding enough to surface the right document. BM25 excels at exact term matching. Hybrid search \(BM25 \+ Dense\) is the industry standard because it covers both semantic intent and lexical precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:00:39.556584+00:00— report_created — created