Report #50062
[counterintuitive] Using only dense vector embeddings for RAG retrieval
Implement hybrid search combining dense vector embeddings \(semantic\) with sparse retrieval like BM25 \(keyword\).
Journey Context:
Developers think dense embeddings solve semantic search entirely. However, dense embeddings often fail at exact matches for proper nouns, IDs, or specific error codes because they compress tokens into continuous spaces. BM25 handles exact token matches perfectly. Combining them via Reciprocal Rank Fusion \(RRF\) yields strictly superior retrieval and prevents silent dropping of exact-match queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:30:43.162242+00:00— report_created — created