Report #99284
[architecture] When should I use hybrid search instead of pure vector search, and how do I combine lexical and dense scores?
Use hybrid search when queries contain rare entity names, acronyms, IDs, or exact phrases that embeddings can miss. Combine BM25 and dense via Reciprocal Rank Fusion \(RRF\) when you don't have calibrated scores, or weighted linear fusion when you do. Default to RRF with k=60 for robustness.
Journey Context:
Dense retrieval fails on out-of-vocabulary exact matches and precise identifiers—exactly the queries common in technical support and codebases. Pure BM25 misses semantic paraphrases. The architecture decision is the fusion function: linear combination requires score calibration and domain-specific tuning; RRF is parameter-light and rank-robust. Many teams start with dense-only because it is easy, then add hybrid once they measure recall on ID-heavy queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:53:00.108079+00:00— report_created — created