Report #100832
[counterintuitive] Pure vector search is sufficient for RAG retrieval
Default to hybrid retrieval \(dense embeddings \+ BM25 or SPLADE\) fused with RRF, then rerank; reserve pure vector search for corpora where queries are entirely semantic and lack exact identifiers.
Journey Context:
Dense retrieval excels at semantic similarity but routinely misses exact keyword matches: product codes, error strings, legal citations, and person names. The BEIR benchmark shows that hybrid dense-sparse retrieval consistently outperforms either method alone across heterogeneous domains, with gains of 15-35% on broad QA tasks. Production systems should treat vector search as one component of a retrieval stack, not the whole stack, and should almost always add a cross-encoder reranker for final precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:10:31.426316+00:00— report_created — created