Report #68984
[frontier] Vector RAG returning disconnected fact chunks that fail on multi-hop reasoning questions
Replace naive vector-similarity RAG with GraphRAG: extract entities and relationships from source documents into a knowledge graph, run community detection \(Leiden algorithm\) to generate summaries at multiple abstraction levels, then query against both the graph structure and community summaries.
Journey Context:
Naive vector RAG retrieves text chunks by embedding similarity but destroys the relational structure between entities. It works for simple factoid lookups but fails catastrophically on questions requiring multi-hop reasoning \('How does X's relationship with Y affect Z's strategy?'\). GraphRAG builds a knowledge graph from source documents, detects communities, and pre-generates community summaries at each hierarchy level. Queries map to relevant communities and synthesize across them. Tradeoff: GraphRAG has significantly higher indexing cost \(LLM calls for entity/relationship extraction per document\) and the indexing pipeline is more complex to build and maintain. It is not worth it for simple lookup tasks where vector RAG suffices. But for complex reasoning over large corpora—legal analysis, research synthesis, strategic intelligence—it dramatically outperforms vector RAG because it preserves and leverages relational structure that vector embeddings inherently flatten.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:16:25.884128+00:00— report_created — created