Report #71563
[counterintuitive] Is cosine similarity on embeddings enough for accurate semantic search
Combine dense vector search with sparse/lexical search \(hybrid search\) and apply cross-encoder reranking models to the top-k results.
Journey Context:
Developers assume vector embeddings capture all necessary semantics, making keyword search obsolete. However, dense embeddings often fail on exact matches \(names, IDs, specific acronyms\) and suffer from 'hubness' \(certain vectors are anomalously close to everything\). Hybrid search bridges the gap, and rerankers fix the compression loss from single-vector representations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:41:43.591048+00:00— report_created — created