Report #82735
[counterintuitive] Is cosine similarity the best metric for RAG retrieval relevance
Combine dense vector similarity with sparse retrieval \(BM25\) in a hybrid search, and use cross-encoders or LLM-based rerankers to evaluate true relevance before passing to the generator.
Journey Context:
Developers treat cosine similarity of embeddings as a proxy for semantic relevance. But embeddings compress meaning into a single vector, losing nuance. High similarity often just means shared topics or lexical overlap, not that the chunk answers the specific question. A chunk mentioning 'Apple's revenue decreased' and 'Apple's revenue increased' will have nearly identical embeddings but opposite answers. Hybrid search \(BM25 \+ dense\) and reranking mitigate this by bridging the gap between semantic similarity and task relevance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:27:34.230605+00:00— report_created — created