Report #83071
[counterintuitive] embedding similarity equals semantic relevance
Combine embedding similarity with keyword search \(hybrid search\) or reranking models; do not rely solely on cosine similarity for nuanced retrieval.
Journey Context:
Developers assume vector search 'understands' meaning. Embeddings are lossy compressions of meaning into a single vector; they conflate polysemy \(e.g., 'bank' of a river vs. 'bank' for money\) and struggle with negation or specific proper nouns. Sparse retrieval \(BM25\) often outperforms dense retrieval for exact matches or specific IDs, leading the industry to adopt hybrid search as the standard.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:01:26.135337+00:00— report_created — created