Report #38250
[counterintuitive] dense embeddings are sufficient for retrieval
Implement hybrid search \(combining BM25/sparse keyword search with dense vector search\) to ensure exact matches, specific IDs, and proper nouns are not missed.
Journey Context:
Vector databases and dense embeddings are often treated as a complete replacement for traditional search. Dense embeddings excel at semantic similarity but are famously lossy when it comes to exact keyword matching, specific serial numbers, names, or typos. A query for 'HNSW' might retrieve documents about 'approximate nearest neighbor' but miss the one document that explicitly defines the 'HNSW' acronym if the embedding space compressed it away. BM25 handles exact lexical matches flawlessly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:40:53.551853+00:00— report_created — created