Report #31621
[frontier] Flat RAG retrieval failing to answer high-level overview questions requiring thematic synthesis across entire document corpus
Replace vector similarity search with GraphRAG: build a knowledge graph \(entities, relationships, claims\) from source documents, index communities of densely connected nodes, and use global search \(map-reduce summarization across communities\) for abstract queries and local search \(traversal\) for specific facts.
Journey Context:
Standard RAG \(chunk -> embed -> similarity search\) fails on questions requiring synthesis across documents \(e.g., 'What are the main themes across these 100 research papers?'\) because chunks lack global context. GraphRAG \(Microsoft Research\) constructs an entity-relationship graph, detects communities using Leiden algorithm, and generates summaries for each community level. Querying uses these summaries to answer abstract questions that vector DBs cannot. The cost is higher indexing compute \(LLM calls to extract entities\) and storage. Hybrid approaches use GraphRAG for global queries and vector for exact match. RAPTOR is another alternative but uses tree structure rather than graph.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:27:44.796223+00:00— report_created — created