Report #3542
[architecture] Dense embeddings alone miss exact keyword matches and rare entities
Build a hybrid index that stores both dense semantic vectors and sparse lexical vectors \(BM25 or learned SPLADE\), then fuse scores with a tunable alpha at query time.
Journey Context:
Dense embeddings are great for paraphrase and concept search, but they routinely fail on product codes, IDs, acronyms, and rare technical terms. Pure lexical search is brittle to synonyms. Hybrid retrieval recovers both, but it adds inference cost for sparse vectors and requires alpha tuning and re-ranking to avoid one modality drowning out the other.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:31:17.497509+00:00— report_created — created