Report #84936
[counterintuitive] cosine similarity semantic relevance
Use hybrid search \(combining BM25 keyword matching and embedding similarity\) and re-ranking models \(cross-encoders\) instead of relying solely on embedding cosine similarity for retrieval.
Journey Context:
Developers use vector databases with cosine similarity assuming it perfectly captures semantic relevance. However, embeddings compress meaning into a single vector, losing nuance and struggling with exact matches, negation, or highly specific terminology \(like part numbers or names\). BM25 catches exact lexical matches that embeddings miss, while cross-encoders evaluate query-document pairs jointly for true relevance, overcoming the limitations of single-vector representation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:09:09.458162+00:00— report_created — created