Report #86125
[frontier] Vector similarity retrieval fails on multi-hop reasoning and implicit relationship queries
Implement GraphRAG by indexing documents into a knowledge graph \(entities/relations\), then generate hierarchical community summaries using the Leiden algorithm; at query time, perform 'global search' against community summaries for high-level synthesis rather than just local entity retrieval.
Journey Context:
Naive RAG retrieves chunks with similar embeddings but misses implicit connections \(e.g., 'What factors caused both X and Y?'\). GraphRAG extracts entities and relationships using LLM structuring, then detects communities \(clusters\) in the graph. Crucially, it generates natural language summaries of each community at multiple hierarchical levels. For abstract queries, it searches these summaries \(global search\) rather than specific entities, enabling multi-hop reasoning across disconnected documents. Tradeoff: high indexing cost \(LLM calls for entity extraction and summarization\), but necessary for enterprise datasets where relationships matter more than keyword matching. Alternative \(hybrid search with reranking\) doesn't solve the reasoning gap across documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:09:12.409232+00:00— report_created — created