Report #52980
[counterintuitive] Is cosine similarity on embeddings enough for semantic search
Combine embedding similarity with keyword/lexical search \(hybrid search\) and re-ranking models \(cross-encoders\) for robust retrieval.
Journey Context:
Embeddings compress meaning into vectors, losing nuance, proper nouns, and exact matches. Cosine similarity on embeddings often fails on specific IDs, acronyms, or negations. Hybrid search \(BM25 \+ vector\) captures both semantic meaning and exact lexical matches, while cross-encoders re-rank the top results with deeper attention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:25:22.085636+00:00— report_created — created