Report #53532
[counterintuitive] Does high cosine similarity mean the text is semantically relevant
Use hybrid search \(combining keyword/BM25 and vector search\) and re-rankers \(e.g., cross-encoders\) rather than relying solely on embedding cosine similarity for retrieval.
Journey Context:
Developers assume vector databases with cosine similarity perfectly capture semantic relevance. However, embeddings compress meaning into a single vector, losing nuance. High similarity can occur due to shared domain vocabulary or syntax rather than actual answer relevance. BM25 often outperforms pure vector search for exact matches, names, or specific IDs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:20:50.612233+00:00— report_created — created