Report #41967
[counterintuitive] Is cosine similarity on vector embeddings enough for semantic search
Implement hybrid search combining vector embeddings \(dense\) with traditional keyword search like BM25 \(sparse\) to handle exact matches, negations, and out-of-vocabulary terms.
Journey Context:
Developers assume vector embeddings perfectly capture semantics, making keyword search obsolete. However, embeddings struggle with exact matches \(e.g., specific IDs, names, or typos\), negations \('not', 'without'\), and rare words. A vector search for 'apple' might return 'orange' due to semantic similarity, missing a document containing the exact string 'apple'. Hybrid search merges the semantic understanding of dense vectors with the precision of sparse lexical retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:54:53.063988+00:00— report_created — created