Report #92965
[counterintuitive] Is cosine similarity of embeddings a perfect measure of semantic relevance
Combine embedding similarity with keyword search \(hybrid search\) or re-ranking models. Do not rely solely on vector similarity for retrieval decisions.
Journey Context:
Developers assume that if two texts have a high cosine similarity in vector space, they are semantically relevant to each other. Embeddings compress meaning into a single vector, often losing nuance, specificity, or negation \(e.g., 'I like dogs' and 'I do not like dogs' can have highly similar embeddings\). Keyword matching \(BM25\) catches exact terms that embeddings miss, making hybrid search significantly more robust.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:37:55.176473+00:00— report_created — created