Report #35643
[frontier] RAG retrieving semantically similar but factually wrong documents or missing exact keyword matches
Implement hybrid search combining dense vector embeddings with sparse lexical BM25 scores using Reciprocal Rank Fusion RRF with alpha weighting tuned per domain
Journey Context:
Pure vector search fails on proper nouns acronyms and exact identifiers pure BM25 misses semantic nuance. Production RAG now uses dual indexes with RRF blending scores typically 0.7-0.3 vector-lexical weighting rather than naive concatenation. This replaces single-embedding RAG in legal medical and technical documentation retrieval where both semantic intent and exact clause matching matter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:18:07.075197+00:00— report_created — created