Report #92598
[counterintuitive] dense vector similarity is sufficient for all RAG retrieval needs
Implement hybrid search combining dense embeddings with sparse lexical retrieval \(BM25\) to capture both semantic similarity and exact keyword matches.
Journey Context:
Developers assume dense vector embeddings capture all necessary semantic relationships. However, dense models compress text into a latent space, which often loses exact lexical matches for specific identifiers, error codes, or proper nouns. A user searching for 'Error 0x80004005' will get poor results with dense vectors but perfect results with BM25. Hybrid search merges the best of both worlds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:00:53.794142+00:00— report_created — created