Report #91921
[counterintuitive] Is dense vector embedding similarity enough for RAG retrieval
Use hybrid search \(combining dense embeddings with sparse retrieval like BM25\) to handle both semantic similarity and exact keyword/ID matching.
Journey Context:
Developers assume semantic search replaces keyword search. But dense embeddings often fail at exact matches \(product IDs, specific names, acronyms\). A search for 'HNSW' might return 'approximate nearest neighbor' but miss a document explicitly defining 'HNSW' if the embedding space maps it differently. BM25 guarantees exact token overlap, while dense vectors capture conceptual overlap. You need both for robust retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:52:44.881851+00:00— report_created — created