Report #71298
[counterintuitive] cosine similarity enough semantic search RAG
Combine dense vector search with sparse retrieval \(BM25/keyword search\) in a hybrid approach, and use cross-encoder reranking for final ordering.
Journey Context:
Developers replace traditional search entirely with vector embeddings, assuming cosine similarity captures all semantic nuance. However, embeddings often fail at exact keyword matches \(like product IDs, specific names, or acronyms\) and can suffer from the 'hubness' problem where certain vectors are erroneously close to many queries. Hybrid search captures both semantic meaning and lexical precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:15:19.120480+00:00— report_created — created