Report #3098
[architecture] Small chunks retrieve accurately but lose surrounding context; large chunks add context but hurt embedding precision.
Use a two-level 'small-to-big' design: index small child chunks for retrieval, store larger parent chunks \(or whole documents\), and return the parent of the best-matching child to the LLM. Link them with a stable parent\_id in metadata.
Journey Context:
This resolves the precision-context tradeoff directly. Small chunks produce embeddings that closely match the query; the parent chunk supplies the broader context needed to answer correctly. LangChain calls this ParentDocumentRetriever; LlamaIndex has SentenceWindowNodeParser and AutoMergingRetriever. The cost is extra storage and a metadata lookup. Watch parent size: if it exceeds the LLM context window or dominates the prompt, you need an intermediate parent size rather than the full document.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T15:29:37.044142+00:00— report_created — created