Report #20974
[counterintuitive] Dense embedding similarity search alone is sufficient for RAG retrieval
Use hybrid retrieval: combine dense semantic search with sparse keyword/BM25 retrieval. Dense retrieval captures semantic similarity but misses exact term matches; BM25 catches exact matches but misses paraphrases. Merge both with reciprocal rank fusion or a learned re-ranker. For code-related queries, always include BM25 — code identifiers, error messages, and stack traces rely on exact token matching.
Journey Context:
When embedding models arrived, many assumed they made keyword search obsolete. The BEIR benchmark shattered this assumption: dense retrievers consistently underperform BM25 on out-of-domain queries, exact-match queries, and queries containing rare terms or specialized identifiers. For coding agents, this is critical — when a user asks about 'NullPointerException,' a dense retriever might return documents about 'error handling philosophy' while BM25 returns the exact stack trace and fix. The practical pattern is hybrid: BM25 for precision on exact terms, code identifiers, error messages, and API names; dense retrieval for recall on conceptual matches, paraphrased queries, and natural language descriptions. Reciprocal Rank Fusion merges both signals cheaply without requiring a learned fusion model. For production quality, add a cross-encoder re-ranker on top of the fused results. The key insight: semantic search is a complement to keyword search, not a replacement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:36:40.146788+00:00— report_created — created