Report #84411
[counterintuitive] Pure vector embedding similarity search is the best retrieval method for RAG
Use hybrid search \(combining vector search with traditional keyword search like BM25\) and implement reciprocal rank fusion \(RRF\) for re-ranking.
Journey Context:
Developers assume semantic search replaces keyword search. However, vector embeddings are notoriously bad at exact matches for out-of-vocabulary words, specific IDs, serial numbers, or names with slight typos. If a user searches for 'HNSW vs IVFFlat', a vector search might return general ANN concepts, missing the exact acronyms. Hybrid search leverages the semantic understanding of vectors and the exact-match precision of sparse retrieval, dramatically reducing missed key terms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:16:41.167660+00:00— report_created — created