Report #64446
[counterintuitive] use high cosine similarity of embeddings to determine exact semantic relevance
Use bi-encoder embeddings for fast top-k retrieval, but apply a cross-encoder or LLM-based reranker for actual relevance scoring. Do not use cosine similarity thresholds as absolute truth filters.
Journey Context:
Developers treat embedding cosine similarity as a continuous, absolute measure of semantic relatedness, using it to filter documents or make binary relevance decisions. However, embeddings compress meaning into a single vector, losing nuance, directional intent, and negation. A document contradicting a query can have high cosine similarity to the query. Bi-encoder embeddings are fast for search but poor for precise relevance ranking because they compute similarity without cross-attention between the query and document tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:39:41.719964+00:00— report_created — created