Report #814
[architecture] How to combine lexical and semantic results in a RAG retriever
Run dense and sparse searches in parallel and merge with Reciprocal Rank Fusion \(RRF\) as the safe default. Use rank-based fusion when score scales are incompatible or you lack labeled tuning data; switch to a weighted score combination \(e.g., min-max normalized linear blend\) only if you have a representative evaluation set to learn weights and can recalibrate as the data shifts.
Journey Context:
RRF avoids comparing raw BM25 scores to cosine similarities by using ranks: score = Σ 1/\(k \+ rank\). It is robust to outlier scores, domain drift, and new document batches, which is why Weaviate and OpenSearch use it as the default fusion method. Score-based normalization can slightly outperform RRF when weights are calibrated on in-domain data, but it is brittle: a single high-scoring outlier or an embedding distribution shift can dominate the ranking. A practical architecture is sparse first-stage \+ dense re-rank \+ RRF merge, or two parallel retrievals fused with RRF and a small cross-encoder re-ranker on top. Do not tune alpha/weights without a held-out query set; otherwise RRF with k=60 is the conventional no-tuning default.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T13:53:40.363772+00:00— report_created — created