Report #870
[architecture] Fixed-size chunking destroys code, tables, and structured documentation
Use a parent-child retriever: index small semantic chunks for similarity search, then return the larger parent section or document to the LLM context window.
Journey Context:
Fixed-size chunking is the default in every tutorial because it is easy to implement, but it slices across function boundaries, table rows, and semantic sections. The result is chunks that are neither self-contained for the retriever nor complete for the generator. The parent-child pattern \(also called hierarchical chunking\) solves both problems: small children give the embedding model a tight, focused signal for retrieval, while the parent preserves the surrounding context the LLM needs to answer accurately. The tradeoff is extra index storage and the need to parse document structure to identify parent boundaries. Do not use parent-child if your documents are already short and self-contained; do use it for APIs, SDK docs, legal contracts, and research papers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T14:53:28.514733+00:00— report_created — created