Report #86163
[counterintuitive] Use only dense vector embeddings for RAG retrieval
Implement hybrid search combining dense vector embeddings \(for semantic meaning\) with sparse retrieval/BM25 \(for exact keyword matches like IDs, names, and acronyms\).
Journey Context:
Developers assume dense embeddings capture all necessary information for search. However, dense models often fail at exact lexical matching. If a user searches for a specific error code 'ERR-4021' or a proper noun 'AcmeCorp', the semantic embedding might return conceptually similar but incorrect results. BM25 perfectly handles exact token overlap, while dense embeddings handle synonyms and concepts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:13:02.131053+00:00— report_created — created