Report #91327
[counterintuitive] embedding cosine similarity best retrieval
Combine dense vector search with lexical search \(BM25\) using hybrid retrieval architectures \(e.g., Reciprocal Rank Fusion\) to capture both semantic similarity and exact keyword matches.
Journey Context:
Developers replace traditional search entirely with vector databases, assuming embeddings capture all meaning. Embeddings are lossy compressions and often fail at exact matches \(names, IDs, acronyms\) or out-of-domain vocabulary. A query for 'HNSW' might return results about 'graph algorithms' generally, missing the exact documentation for HNSW. Hybrid search bridges this gap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:53:10.570278+00:00— report_created — created