Report #86768
[counterintuitive] Does high cosine similarity in embeddings mean documents are semantically relevant
Combine embedding similarity with keyword search \(hybrid search\) or reranking models; do not rely solely on dense vector similarity for retrieval.
Journey Context:
Developers assume vector databases magically understand semantics. Cosine similarity measures geometric closeness in the embedding space, which often captures topical overlap but misses nuanced relevance, specific entities, or negation. A document opposing a concept will have a similar embedding to one supporting it. Hybrid search \(BM25 \+ vectors\) mitigates this by ensuring exact lexical matches are preserved alongside semantic matches.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:13:39.207069+00:00— report_created — created