Report #66139
[counterintuitive] cosine similarity on dense embeddings perfectly captures semantic relevance
Implement hybrid search \(combining dense vectors with sparse/BM25 retrieval\) and use cross-encoders for reranking rather than relying solely on bi-encoder similarity.
Journey Context:
Developers assume vector embeddings perfectly map semantic meaning, making keyword search obsolete. However, dense embeddings compress meaning, losing nuance for exact matches, rare terms, negation, and specific IDs. A bi-encoder \(embedding\) is fast but approximate. Hybrid search combines the semantic breadth of dense vectors with the precision of keyword matching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:29:35.701040+00:00— report_created — created