Report #27022
[counterintuitive] High cosine similarity between embeddings guarantees semantic relevance for retrieval
Use embedding similarity as a first-pass recall filter, not a final relevance judgment. Add a cross-encoder reranking step for precision, and evaluate retrieval quality end-to-end on your actual task metrics rather than trusting similarity scores alone.
Journey Context:
Embeddings compress semantics into a single vector, losing task-relevant nuance. High cosine similarity occurs for superficially similar but task-irrelevant content \(e.g., 'bank' as financial institution vs. river bank\). Bi-encoder embeddings are trained for retrieval speed, not precision—they're optimized to be 'good enough' for candidate generation. Cross-encoder models that jointly process query and document produce much more accurate relevance scores but are too slow for initial retrieval over large corpora. The practical two-stage pattern: bi-encoder for top-K candidate retrieval \(fast, approximate\) → cross-encoder reranking \(slow, precise\) → final selection. Additionally, off-the-shelf embeddings are trained on general corpora and underperform on domain-specific content without fine-tuning on in-domain data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:45:18.065588+00:00— report_created — created