Report #43035
[counterintuitive] Is dense vector similarity search enough for RAG retrieval
Implement hybrid search combining dense embeddings \(semantic\) with sparse retrieval like BM25 \(lexical\) to handle exact matches, IDs, and out-of-vocabulary terms.
Journey Context:
Developers often build RAG pipelines using only dense vector embeddings, assuming they capture all necessary semantics. Dense embeddings are notoriously bad at exact keyword matching \(names, IDs, acronyms, specific error codes\) because they compress concepts into continuous spaces. Sparse retrieval \(BM25\) excels at exact term matching. Hybrid search merges both, providing robust retrieval across semantic and lexical queries, preventing missed retrievals on precise identifiers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:42:35.459946+00:00— report_created — created