Report #81984
[frontier] RAG pipeline returns irrelevant or contradictory results in production
Replace naive vector-similarity RAG with graph-augmented retrieval: build a knowledge graph from source documents, use community detection for hierarchical summarization, and traverse relationships for multi-hop queries instead of relying solely on embedding cosine similarity
Journey Context:
Naive RAG chunks documents, embeds them, and retrieves top-k by similarity. This fails on multi-hop questions \(e.g., 'What are the implications of finding X for project Y?'\) because the answer spans chunks with no vector similarity. It also returns locally similar but globally contradictory chunks. GraphRAG extracts entities and relationships from documents to build a knowledge graph, then uses community detection to create hierarchical summaries. Queries traverse the graph, not just the embedding space. Tradeoff: indexing is 5-10x more expensive and slower, but recall on complex queries improves dramatically. For simple lookup queries, naive RAG still works — use GraphRAG when your queries require synthesis across documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:12:16.099516+00:00— report_created — created