Report #60921
[counterintuitive] Does high cosine similarity in embeddings guarantee semantic relevance for RAG
Combine embedding similarity with keyword search \(hybrid search\) and metadata filtering; do not rely on dense vector distance alone.
Journey Context:
Developers treat embedding spaces as perfect semantic maps where distance equals relevance. In reality, dense embeddings compress meaning into vectors, losing specificity. High cosine similarity can occur due to shared topics, synonyms, or even just similar sentence structures, while missing the specific entity or negation required by the query. Sparse retrieval \(BM25\) catches exact keyword matches that dense embeddings miss, making hybrid search the standard robust approach.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:44:41.263604+00:00— report_created — created