Report #83295
[counterintuitive] Is cosine similarity of embeddings a reliable measure of semantic relevance for RAG
Use embedding similarity as a coarse filter, but pair it with cross-encoder reranking \(e.g., Cohere Rerank, BGE-reranker\) or LLM-based relevance scoring before injecting chunks into the prompt.
Journey Context:
Developers assume vector databases return the 'most relevant' documents because cosine distance is low. Bi-encoder embeddings compress semantics into a single vector, losing nuance and lexical specificity. They are optimized for search speed, not absolute relevance. A chunk about 'Apple \(fruit\)' and 'Apple \(company\)' might have similar embeddings depending on the model, leading to irrelevant retrieval. Cross-encoders perform full attention over the query and document pair, yielding much higher relevance at the cost of speed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:23:43.552700+00:00— report_created — created