Report #92411
[counterintuitive] Is dense vector search better than keyword search for RAG
Implement hybrid search \(combining BM25/sparse keyword search with dense vector search\) rather than relying purely on embedding similarity.
Journey Context:
The consensus shifted heavily to 'embeddings understand semantics, keywords are outdated.' But dense embeddings are notoriously bad at exact matches: specific serial numbers, names, or rare acronyms often get mapped to generic neighborhoods, causing retrieval failure. BM25 excels at exact token matching. Hybrid search captures both semantic intent and lexical precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:42:16.983530+00:00— report_created — created