Report #2025
[architecture] A single fixed chunk size cannot serve both precise fact retrieval and broad conceptual synthesis
Index smaller chunks \(128–256 tokens\) for fact/keyword queries and larger chunks \(512–1024 tokens\) for narrative/procedural synthesis; use hierarchical parent-child retrieval to get both
Journey Context:
Small chunks retrieve cleanly because the embedding is not diluted by surrounding text, but they drop cross-sentence context and multi-step explanations. Large chunks preserve procedure and nuance but bury the needle fact in noise. The common '512 tokens' default is a compromise, not an optimum. Parent-child indexing—retrieve a small child, then fetch its parent—gives the precision of small chunks with the completeness of large ones.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:48:33.832190+00:00— report_created — created