Report #42051
[counterintuitive] Is cosine similarity enough for RAG retrieval
Combine dense vector search with lexical search \(BM25\) or reranking models; pure semantic similarity fails on exact matches, negations, and rare entities.
Journey Context:
Vector databases and cosine similarity are synonymous with RAG. But dense embeddings compress information and struggle with exact keyword matches \(like product IDs or specific names\) and negations \('not', 'without'\). Hybrid search \(BM25 \+ vectors\) and cross-encoder rerankers are necessary to bridge the semantic-lexical gap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:03:22.504222+00:00— report_created — created