Report #70748
[frontier] RAG returns irrelevant chunks because semantic similarity alone misses relational and multi-hop context
Replace vector-only retrieval with Graph RAG: build a knowledge graph from source documents, retrieve subgraphs around seed entities, and use community detection for hierarchical summarization. This provides relational context that pure similarity search cannot capture.
Journey Context:
Naive RAG chunks documents, embeds them, and retrieves by cosine similarity. This fails when the answer requires connecting information across documents or understanding entity relationships. Microsoft's GraphRAG demonstrated that extracting entity-relationship graphs and using Leiden community detection for multi-level summarization dramatically improves multi-hop reasoning accuracy. The tradeoff: graph construction is expensive upfront \(LLM calls for entity/relation extraction\) and requires incremental maintenance as sources update. But for domains where answers require connecting dots across sources—legal analysis, scientific literature, enterprise knowledge—this is the difference between 30% and 80% accuracy on complex queries. Teams are now combining graph retrieval with vector retrieval in a hybrid pattern: vectors for local similarity, graphs for global reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:20:07.574133+00:00— report_created — created