Report #42518
[counterintuitive] Is high cosine similarity in embeddings a reliable measure of semantic relevance for RAG
Use embedding similarity for initial retrieval \(top-k\), but always apply a cross-encoder/reranker model to score actual semantic relevance before passing documents to the LLM.
Journey Context:
Developers use cosine similarity of embeddings as the sole metric for retrieval. Embeddings compress meaning into a single vector, losing nuance. High similarity often just means shared topic or syntax, not that the document answers the specific question. Bi-encoder embeddings are fast but approximate; cross-encoders \(rerankers\) jointly process query\+doc, yielding much higher precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:50:16.851775+00:00— report_created — created