Report #88966
[counterintuitive] Is cosine similarity on dense embeddings enough for RAG retrieval
Combine dense vector search with sparse/keyword retrieval \(hybrid search\) and implement re-ranking to bridge the semantic-syntactic gap.
Journey Context:
Developers assume dense embeddings capture all necessary retrieval signals. However, dense retrievers often fail on exact keyword matches \(names, IDs, specific acronyms\) because they compress information into a latent space. Sparse retrieval \(BM25\) catches the exact terms, while dense retrieval catches semantic meaning. Hybrid search plus a cross-encoder reranker consistently outperforms pure dense retrieval in production RAG.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:55:02.876486+00:00— report_created — created