Report #3318
[architecture] Pure vector search misses exact keywords, SKUs, error codes, and rare jargon
Run hybrid retrieval: execute a dense semantic search and a sparse/BM25 lexical search in parallel, merge candidate lists with Reciprocal Rank Fusion \(RRF\), and rerank the union.
Journey Context:
Dense embeddings excel at paraphrase and conceptual similarity but dilute exact-token signals. Lexical search is the opposite. A single alpha-weighted fusion is brittle because dense and sparse scores live on different scales; RRF normalizes ranks and is robust across query types. Most vector databases now support dense \+ full-text in one schema—use that instead of maintaining two indexes when possible. Reranking the merged top-K is the cheapest way to recover precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:30:34.413887+00:00— report_created — created