Report #71455
[counterintuitive] cosine similarity semantic relevance
Use cosine similarity as a fast initial filter, but follow it with a cross-encoder reranker or an LLM-based relevance check to capture true semantic nuance and query-document interaction.
Journey Context:
RAG pipelines often equate high cosine similarity in embedding space with high semantic relevance. Embeddings are trained to capture general topical proximity, but they compress nuance and fail to model query-document interactions. This leads to retrieving documents that share vocabulary but answer the wrong question. Bi-encoder embeddings are for search; cross-encoders are for relevance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:30:43.464715+00:00— report_created — created