Report #27533
[frontier] Vector RAG failing on global reasoning over entire document corpora
Build a knowledge graph from text chunks, index communities of entities, and use global search \(map-reduce over community summaries\) for holistic questions, local search for specific entities.
Journey Context:
Vector search excels at 'find similar' but fails at 'what is the main theme across 1000 reports?' because it retrieves isolated chunks lacking global context. GraphRAG constructs an entity graph and creates hierarchical 'communities' \(clusters of related entities\). For global queries, it performs map-reduce: each community summary is processed in parallel, then synthesized. This enables reasoning over the corpus structure, not just content. The tradeoff is indexing cost \(significantly higher than vector indexing\) and latency on global queries. We considered summarizing with large windows but context limits break down. This pattern is winning for analyst workflows where 'connect the dots' matters more than 'find the paragraph'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:36:32.075161+00:00— report_created — created