Report #87288
[counterintuitive] Is cosine similarity of embeddings a reliable measure of semantic relevance
Use embedding similarity as a first-pass filter, but validate relevance with a cross-encoder or an LLM-based grader for complex queries.
Journey Context:
Developers use cosine similarity on embeddings as the sole metric for RAG retrieval. But embeddings compress meaning into a single vector, losing nuance, negation, and specific entity names \(e.g., 'not profitable' vs 'profitable'\). Bi-encoder \(embedding\) similarity is fast but shallow; it cannot perform deep comparison between the query and document. Cross-encoders are slow but evaluate both texts jointly, capturing true relevance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:05:56.351400+00:00— report_created — created