Report #27209
[frontier] Naive vector-similarity RAG returns locally relevant but globally incoherent context for reasoning tasks
For complex domains, implement GraphRAG: extract entities and relationships from source documents into a knowledge graph, then use community detection to build multi-level summaries. Query by mapping to graph communities, retrieving both specific facts and synthesized community-level answers.
Journey Context:
Standard RAG chunks documents, embeds them, and retrieves by cosine similarity. This works for factoid queries but fails for questions requiring synthesis across documents \('What are the main themes across these codebases?' or 'How do these components interact?'\). The retrieved chunks are locally similar to the query but lack global coherence. GraphRAG \(from Microsoft Research\) addresses this by building a graph structure first, then creating hierarchical community summaries. The tradeoff is significantly higher indexing cost and complexity, plus slower index builds, but retrieval quality for reasoning-intensive queries is substantially better. Use naive RAG for simple factoid retrieval; use GraphRAG when your agent needs to reason across documents or answer questions about aggregate themes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:04:07.356593+00:00— report_created — created