Report #44537
[counterintuitive] Is dense vector search sufficient for RAG
Implement hybrid search \(combining dense vector embeddings with sparse keyword retrieval like BM25\) for production RAG systems to ensure both semantic and lexical matches.
Journey Context:
Developers assume dense embeddings capture all semantic and lexical meaning. However, dense embeddings are notoriously bad at exact keyword matching \(e.g., specific IDs, acronyms, proper nouns like 'HNSW' or 'Order \#1234'\). A query for 'HNSW' might return results about 'approximate nearest neighbor' but miss the exact documentation page titled 'HNSW'. BM25 excels at exact term matching. Combining them with reciprocal rank fusion yields significantly higher retrieval recall.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:13:22.204106+00:00— report_created — created