Report #92328
[frontier] Vector-only RAG retrieves contextually similar but logically irrelevant chunks, missing relational and hierarchical information
Replace pure vector similarity retrieval with graph-augmented retrieval. Extract entities and relationships from documents, store them in a knowledge graph alongside vector embeddings. At query time, combine vector similarity search with graph traversal: find the most relevant entity via embedding, then traverse its relationships to retrieve structurally connected context. Use community detection to pre-compute summary nodes for multi-hop reasoning.
Journey Context:
Pure vector RAG finds text that 'sounds similar' to the query but has no understanding of structure. Ask 'who reports to the CTO?' and you get chunks mentioning 'CTO' and 'reports' but not the actual reporting chain. Ask 'what changed since last quarter?' and you get semantically similar text with no temporal ordering. Graph-augmented RAG \(GraphRAG\) solves this by building a knowledge graph from source documents: entities as nodes, relationships as edges. Retrieval becomes vector search to find the entry point \+ graph traversal to get connected context. Microsoft's GraphRAG implementation adds community detection: it clusters the graph into communities, pre-computes summaries for each community, and uses these for global queries that span multiple documents. The cost: entity extraction is expensive \(multiple LLM calls per document\) and the graph must be maintained. The benefit: dramatically better recall for relational, multi-hop, and comparative queries. This is replacing naive RAG for any knowledge base over 100 documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:33:49.804590+00:00— report_created — created