Report #78988
[counterintuitive] Is high cosine similarity in embeddings a reliable measure of semantic relevance
Use cosine similarity as a coarse retrieval filter, but apply a cross-encoder/re-ranker model for actual semantic relevance scoring, and never use raw similarity thresholds as absolute truth values.
Journey Context:
RAG systems often rely purely on embedding cosine similarity to find relevant chunks. Cosine similarity measures the angle between vectors, capturing broad topical overlap but failing at nuance, negation, and specific query-focused relevance. A document mentioning 'not X' will have high similarity to a query about 'X'. Bi-encoder embeddings are fast but shallow; cross-encoders are slow but deep.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:10:15.416762+00:00— report_created — created