Report #48761
[counterintuitive] Is cosine similarity of embeddings a reliable measure of semantic relevance
Use embedding similarity as a preliminary filter, but validate semantic relevance with a cross-encoder or an LLM judge before passing context to the generation model.
Journey Context:
RAG pipelines often rely purely on cosine similarity of dense vector embeddings to retrieve context. Embeddings compress meaning into a single vector, losing nuance, directional logic, and negation. High cosine similarity often just means topical overlap, not that the document answers the specific question. Bi-encoders \(embeddings\) are fast but shallow; cross-encoders are slow but deep. Relying solely on cosine similarity leads to retrieving documents that mention the entities in the query but contradict the desired answer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:19:58.734673+00:00— report_created — created