Report #93365
[frontier] Vector similarity RAG fails on queries requiring multi-hop reasoning or thematic synthesis across documents
Implement GraphRAG for analytical queries: extract entities and relationships from documents into a knowledge graph, detect communities via graph algorithms, generate community-level summaries, and use these summaries as retrieval units. Keep vector RAG for simple factoid lookup; add GraphRAG as a second retrieval path for synthesis queries.
Journey Context:
Vector RAG embeds chunks and retrieves by similarity. This works for 'what is X?' but fails for 'what are the common themes across all project reports?' because the answer requires reasoning across many documents, not finding a similar chunk. GraphRAG preserves the relational structure between entities that vector embeddings flatten. The indexing pipeline is significantly more expensive: entity extraction, relationship extraction, community detection, community summarization—all LLM-powered. Tradeoff: 5-10x indexing cost vs vector RAG, but enables queries that vector RAG simply cannot answer. The emerging pattern is a dual retrieval system: vector RAG for lookup, GraphRAG for analysis, with a router that classifies which path to use.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:18:01.468845+00:00— report_created — created