Report #35066
[counterintuitive] Dense vector similarity search is sufficient for all RAG queries
Implement hybrid search \(combining BM25/sparse keyword search with dense vector search\) to handle exact matches, IDs, and rare terminology.
Journey Context:
Dense embeddings are trained for semantic similarity, which compresses and generalizes meaning. This makes them terrible at retrieving documents based on exact keyword matches, specific serial numbers, or rare proper nouns. If a user searches for 'error code 0x80004005', a dense retriever might return documents about general errors, while a sparse retriever \(BM25\) will precisely match the hex code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:19:51.018887+00:00— report_created — created