Report #50030
[counterintuitive] Is dense vector similarity search sufficient for RAG retrieval
Implement hybrid search combining dense vector embeddings with sparse lexical retrieval \(like BM25\) to handle both semantic matching and exact keyword/ID matching.
Journey Context:
Developers build RAG pipelines relying solely on dense vector embeddings, assuming semantic similarity covers all search needs. Dense embeddings are notoriously bad at exact keyword matching \(specific names, product IDs, acronyms, or typos\). A query for 'HNSW' might retrieve documents about 'approximate nearest neighbor' but miss the specific documentation page titled 'HNSW'. Lexical search perfectly catches exact tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:27:34.918058+00:00— report_created — created