Report #83364
[counterintuitive] Dense vector similarity search is sufficient for semantic retrieval
Use hybrid search \(combining sparse/keyword retrieval like BM25 with dense vector search\) for production RAG pipelines, especially for queries involving specific identifiers or negation.
Journey Context:
Dense embeddings excel at semantic similarity but fail catastrophically at exact lexical matches \(names, IDs, error codes\) and logical negations. A user searching for 'error 404' might get semantically similar but incorrect error codes. BM25 handles exact token matches perfectly. Hybrid search combines both, drastically reducing retrieval failure rates in production.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:30:42.044629+00:00— report_created — created