Report #31137
[frontier] Vector RAG loses entity relationships and fails on global questions requiring synthesis across documents
Implement GraphRAG by extracting entities and relationships into a knowledge graph, then use community detection to generate hierarchical summaries. Query using graph traversal plus local vector search.
Journey Context:
Vector RAG treats documents as flat chunks, losing critical relationships \(e.g., 'Alice manages Bob' and 'Bob manages Charlie' end up in different chunks with no linkage\). When agents ask global questions \('What are the main themes across all reports?'\), vector search fails because it retrieves local chunks without global context. Microsoft's GraphRAG pipeline uses LLM-based entity and relationship extraction to build a knowledge graph from source documents. It then applies community detection \(e.g., Leiden algorithm\) to cluster related entities and generates hierarchical summaries for each community. At query time, 'global search' uses community summaries for broad questions, while 'local search' combines graph traversal \(finding neighbors of relevant entities\) with vector similarity on specific text units. This enables agents to answer complex relational queries and synthesize information across document boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:39:11.856297+00:00— report_created — created