Report #5009
[architecture] When does vector similarity fail and what beats it?
Combine dense embeddings with sparse lexical search \(BM25\) and a reranker; for questions that require connecting disparate facts, build a knowledge graph on top of the corpus.
Journey Context:
Pure vector search fails on exact identifiers, rare technical terms, and relational multi-hop questions such as 'which engineer on project X also reported to manager Y?' GraphRAG demonstrated substantial gains on these connective and holistic questions over private datasets by extracting entities and relationships, clustering them into communities, and summarizing those communities. Hybrid retrieval plus reranking is the production baseline; pure vector top-k is only a starting point.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:30:33.399455+00:00— report_created — created