Report #90457
[counterintuitive] embedding similarity guarantees semantic relevance
Use hybrid search \(combining dense vector embeddings with sparse lexical retrieval like BM25\) rather than pure semantic search for production RAG systems.
Journey Context:
Developers assume dense vector embeddings perfectly capture meaning, so cosine similarity is the ultimate retrieval metric. Dense embeddings are lossy compressions; they often fail to retrieve documents containing exact names, IDs, acronyms, or specific code syntax because these lack broad semantic neighbors. A query for 'HNSW' might retrieve documents about 'approximate nearest neighbor' but miss the exact paper introducing 'HNSW'. Pure semantic search fails on lexical precision. Hybrid search merges the semantic understanding of dense vectors with the exact-match guarantees of sparse vectors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:25:41.319384+00:00— report_created — created