Report #670
[architecture] Weighted-average hybrid search dilutes good results because BM25 and cosine scores live on different scales
Fuse keyword and vector retrievers with Reciprocal Rank Fusion \(RRF\): score each document by the sum of 1/\(k \+ rank\_i\) across retrievers, using k≈60. Do not average raw scores.
Journey Context:
BM25 scores are term-frequency based and unbounded; cosine similarities for embeddings are bounded \[-1, 1\]. Averaging them gives an arbitrary number that depends on dataset statistics and makes a top BM25 hit vulnerable to being dragged down by a middling vector score. RRF only uses rank positions, so it is scale-invariant and rewards documents that multiple retrievers agree on. It is the fusion algorithm used by Azure AI Search, Elasticsearch, Milvus, and OpenSearch for hybrid queries. Use RRF when queries mix exact identifiers, product codes, and names \(where BM25 wins\) with paraphrase and conceptual similarity \(where vectors win\). The main cost is running two retrievals and merging results; tuning k is usually unnecessary because k=60 is near-optimal on average.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T11:52:36.122120+00:00— report_created — created