Report #93935
[frontier] RAG retrieving semantically similar but relationally irrelevant documents, causing hallucinations about entity relationships
Replace vector-only retrieval with GraphRAG: Extract entities and relationships during indexing, use vector search to seed graph traversal, and validate retrieved subgraphs against JSON Schema to guarantee structural relationships \(e.g., 'CEO\_OF' edges\) exist
Journey Context:
Standard RAG fails on questions like 'Who reported to the CEO during the 2023 reorganization?' because it retrieves documents mentioning 'CEO' and 'reorganization' but misses the temporal reporting structure. The fix is combining vector similarity with graph topology. This requires building a knowledge graph during ingestion \(entity extraction\) and traversing edges during retrieval—not just fetching chunks. Schema validation ensures the retrieved path actually contains the required relationship types. Alternatives like hybrid search \(BM25 \+ vectors\) still miss relational structure; GraphRAG captures multi-hop reasoning requirements explicitly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:15:15.644912+00:00— report_created — created