Report #83511
[counterintuitive] Is vector similarity search enough for semantic RAG retrieval
Use hybrid search \(combining BM25 keyword search and vector search\) or late-interaction models \(ColBERT\) instead of relying solely on single-vector embeddings.
Journey Context:
Developers assume dense embeddings capture all semantic meaning. However, embeddings compress meaning into a single vector and struggle with exact keyword matches \(e.g., specific IDs, proper nouns, error codes\) and negation. BM25 handles exact matches perfectly, while vectors handle synonyms. Combining them yields significantly higher retrieval recall than either method alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:45:32.598622+00:00— report_created — created