Report #61343
[counterintuitive] Is cosine similarity of embeddings a reliable measure of semantic relevance for RAG
Use embedding similarity for initial retrieval, but apply a cross-encoder/reranker model to evaluate true semantic relevance before passing chunks to the LLM.
Journey Context:
Developers treat vector similarity search as the final word on relevance. But embeddings compress meaning into a single vector, losing nuance. High cosine similarity can occur due to shared vocabulary or topic overlap without answering the specific query. Bi-encoders are fast but fuzzy; cross-encoders are slow but precise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:26:59.374409+00:00— report_created — created