Report #98828
[architecture] Fixed-size chunking splits coherent sections and semantic boundaries
Use semantic chunking for irregular, narrative content, small fixed-size chunks with overlap for extractive or citation-heavy tasks, and late chunking when you have a long-context embedding model and want each chunk to carry full-document context.
Journey Context:
There is no universal chunk size. Semantic chunking groups sentences by embedding similarity and respects topic boundaries, which helps abstractive summarization and broad retrieval. But for precise fact extraction, citation lookup, or code retrieval, small fixed-size chunks \(e.g., 256-512 tokens with 10-20% overlap\) usually retrieve the exact span more reliably because semantic boundaries can lump unrelated facts together. Late chunking offers a third path: embed the entire document once with a long-context model, then pool token representations per chunk so each chunk inherits full-document context without needing overlap. The tradeoff is that late chunking requires a long-context embedding model and more indexing compute; semantic and fixed chunking work with any embedding model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T04:51:07.192109+00:00— report_created — created