Report #82346
[counterintuitive] Does high cosine similarity mean the text is semantically relevant
Use hybrid search \(combining keyword/BM25 and vector search\) and apply cross-encoders \(rerankers\) after initial retrieval. Do not rely solely on embedding cosine similarity for relevance.
Journey Context:
Developers assume vector search replaces keyword search because embeddings understand meaning. Cosine similarity in standard dense embeddings captures general topical similarity but often misses precise lexical matches \(names, IDs, specific acronyms\) and can be fooled by antonyms or unrelated text with similar vector norms. Bi-encoder embeddings are fast but fuzzy; cross-encoders are slow but precise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:48:30.140348+00:00— report_created — created