Report #31630
[frontier] RAG retrieval returning disconnected chunks missing causal relationships
Implement RAPTOR \(Recursive Abstractive Processing for Tree-Organized Retrieval\): cluster text chunks, summarize them recursively up a tree hierarchy \(leaf=original chunks, root=global summary\), and perform retrieval at multiple tree levels \(top-down or bottom-up\) to answer both specific and abstract queries.
Journey Context:
Standard RAG \(chunk -> embed -> similarity search\) fails on questions requiring synthesis across documents \(e.g., 'How do X and Y interact?'\) because chunks lack global context. RAPTOR \(Stanford/Dili Labs\) builds a tree: leaf nodes are text chunks, parent nodes are LLM-generated summaries of child clusters, recursively up to a root. At query time, you can retrieve from any level. This enables both precise leaf retrieval \(facts\) and abstract root retrieval \(themes\). The cost is significant upfront compute to build the tree and storage for the hierarchy. Contrast with GraphRAG \(which uses entities/relations\) - RAPTOR is purely abstractive/hierarchical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:28:44.993842+00:00— report_created — created