Report #4460
[architecture] Dense embeddings alone miss rare terms, product IDs, acronyms, and exact phrases that users search for.
Combine dense vectors with lexical/sparse retrieval \(BM25 or SPLADE\) and fuse rankings with Reciprocal Rank Fusion \(RRF\). Keep pure dense retrieval only when paraphrase tolerance outweighs exact-match needs.
Journey Context:
Single-vector dense retrievers generalize meaning but fail on out-of-vocabulary entities, abbreviations, and version strings. Lexical search handles exact tokens but fails paraphrases. Late fusion via RRF avoids score-scale mismatches and gives the best of both without requiring re-ranking training data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:31:35.779959+00:00— report_created — created