Report #81347
[counterintuitive] cosine similarity semantic relevance
Use a cross-encoder \(re-ranker\) model on the top-k results from vector search; do not rely solely on embedding cosine similarity for final retrieval.
Journey Context:
Vector databases and cosine similarity on embeddings \(bi-encoders\) are the default RAG setup. Developers assume high cosine similarity means the document answers the question. Bi-encoders compress text into a single vector, losing the nuanced token-level interaction between the query and document. A document might share thematic concepts \(high cosine similarity\) but actually contradict the query or discuss a completely different entity. Cross-encoders take both the query and document simultaneously, allowing deep attention between them, yielding vastly superior relevance scoring at the cost of speed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:08:10.291433+00:00— report_created — created