Report #58649
[counterintuitive] Is dense embedding cosine similarity enough for RAG retrieval
Implement hybrid search combining dense vector similarity with sparse retrieval \(BM25/keyword search\) to capture both semantic meaning and exact lexical matches.
Journey Context:
Developers assume dense embeddings capture all meaning, so they drop traditional keyword search. Dense models are notoriously bad at exact matches for specific IDs, names, or rare acronyms due to tokenization granularity. A search for a specific product ID might return results for similar products instead of the exact string. Hybrid search leverages the strengths of both: BM25 for exact lexical overlap and dense vectors for semantic synonyms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:55:58.340712+00:00— report_created — created