Report #93545
[frontier] Naive vector similarity RAG fails on multi-hop reasoning, comparative questions, and relationship queries
Replace vector-only RAG with Graph RAG: extract entities and relationships from documents into a knowledge graph, run community detection to create hierarchical summaries, then retrieve using graph traversal and community summaries instead of embedding similarity alone
Journey Context:
Vector RAG works for simple factoid lookups but catastrophically fails on questions requiring connecting information across documents \(e.g., 'How does the architecture of System A compare to System B?' or 'What are the cascading effects of X on Y?'\). Graph RAG, developed by Microsoft Research, addresses this by building a graph index: entities as nodes, relationships as edges, with Leiden community detection creating hierarchical partitions. Each community gets an LLM-generated summary. At query time, the system can traverse relationships, aggregate across communities, and reason about connections that embedding similarity cannot capture. The tradeoff is significantly higher indexing cost \(entity extraction is LLM-intensive\) and storage, but for domains where multi-hop reasoning matters, the quality improvement is dramatic. Production teams are using this for codebase understanding, legal analysis, and scientific literature review.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:36:09.201144+00:00— report_created — created