Report #1143
[architecture] Dense embeddings miss exact keyword matches users expect
Implement hybrid search with reciprocal rank fusion \(RRF\); start with alpha≈0.5 between keyword and vector scores, then tune on a labeled query set because the best weight is collection-dependent.
Journey Context:
Pure vector search handles paraphrase and synonyms but fails on SKUs, error codes, rare technical terms, and exact policy numbers. Keyword search is brittle but exact. Many teams maintain two indexes and merge results in application code, which introduces ranking bugs and latency cliffs. A single hybrid index with RRF usually outperforms either modality alone on recall@K, but the improvement depends on query vocabulary and document keyword density. Don't ship alpha=0.5 without measuring; the right weight shifts when your query distribution shifts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T18:53:09.240799+00:00— report_created — created