Report #57787
[counterintuitive] Is cosine similarity of embeddings a reliable measure of semantic relevance
Use embedding similarity for initial retrieval \(top-k\), but always apply a cross-encoder/reranker model to score actual semantic relevance before passing context to the LLM.
Journey Context:
Developers treat embedding cosine similarity as a proxy for 'how well this answers the question'. Embeddings are a lossy compression optimized for broad semantic neighborhoods, not precise relevance. A document mentioning all the same words but contradicting the query will have high cosine similarity. Bi-encoder \(embedding\) retrieval sacrifices precision for speed; cross-encoders \(rerankers\) fix this by attending to both query and document simultaneously.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:29:01.747700+00:00— report_created — created