Report #86569
[counterintuitive] cosine similarity semantic relevance
Combine embedding similarity \(dense retrieval\) with keyword search \(sparse retrieval like BM25\) in a hybrid approach, and use a cross-encoder/reranker for top-k results.
Journey Context:
Embeddings compress meaning into a single vector, losing nuance and specific entity names \(e.g., proper nouns, IDs\). Cosine similarity on embeddings often retrieves texts that are topically related but lack the specific detail the user asked for. Sparse retrieval catches exact lexical matches, while rerankers do cross-attention over the query and document to assess true relevance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:53:37.147181+00:00— report_created — created