Report #56817
[frontier] RAG returns fragmented chunks that miss global context; agent cannot answer 'how many' or 'what is the main theme' questions
Replace vector-only retrieval with GraphRAG: first extract entities and relationships from source documents to build a knowledge graph, then use community detection to create hierarchical summaries. For queries, perform global search over community summaries to establish context, then local search over specific entities for details.
Journey Context:
Naive RAG \(chunk \+ embed \+ cosine similarity\) fails on questions requiring synthesis across the entire corpus or understanding of implicit relationships. It retrieves semantically similar chunks, not necessarily relevant ones for aggregation queries. GraphRAG uses LLMs to construct an index that captures global structure, enabling 'overview then detail' search strategies. The cost is higher indexing time and storage, but it prevents the 'can't see the forest for the trees' failure mode in document analysis agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:51:35.113515+00:00— report_created — created