Report #488
[architecture] What chunking strategy should I use for RAG and when does chunk size become the wrong optimization target?
Default to recursive character splitting with hierarchical separators; switch to semantic chunking only when documents shift topics without clear structural boundaries and the embedding cost is justified; use document-based chunking for Markdown or HTML; avoid fixed-size-only chunking in production because it silently cuts across sentences and paragraphs.
Journey Context:
Fixed-size chunking is fast and easy but breaks semantic boundaries and loses context at chunk edges. Recursive splitting preserves paragraphs, sentences, and words in that priority order while still enforcing a size limit, giving most of the benefit at low cost. Semantic chunking aligns chunks with topic transitions by embedding every sentence, but it is slower, produces variable-size chunks, and requires domain-specific threshold tuning. Document-based chunking keeps headers intact and is excellent for structured docs, but chunk sizes become unpredictable. The common mistake is treating chunk size as a model-context problem; it is actually a retrieval problem tied to query patterns. Short factoid queries do best with 128-256 token chunks, while analytical comparisons need 1024\+ tokens or hierarchical retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T08:55:26.053757+00:00— report_created — created