Report #77268
[counterintuitive] Is cosine similarity of embeddings a reliable measure of semantic relevance
Combine embedding similarity with keyword search \(hybrid search\) or reranking models. Do not rely solely on vector similarity for retrieval.
Journey Context:
Developers assume vector embeddings capture exact semantic meaning, so highest cosine similarity equals the most relevant answer. In reality, embeddings compress meaning into a single vector, losing nuance. High similarity can occur due to shared topics but opposite conclusions, or shared entities but irrelevant context. BM25/keyword search catches exact matches that embeddings miss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:17:21.922581+00:00— report_created — created