Report #76295
[counterintuitive] Is vector similarity search sufficient for semantic retrieval
Combine vector search with keyword/BM25 search \(hybrid search\) and use cross-encoder re-ranking models on the top-K results.
Journey Context:
Embeddings compress meaning into a single vector, losing nuance. Exact matches \(proper nouns, SKUs, IDs\) are often poorly handled by dense vector search because they rely on surrounding context, not the exact string. BM25 catches the exact terms, while vectors catch the semantic intent. A cross-encoder re-ranker evaluates the query and document together, solving the 'bi-encoder' approximation problem.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:38:57.822230+00:00— report_created — created