Report #97321
[architecture] How should I chunk long documents for RAG so retrieval doesn't lose context or split answers across boundaries?
Default to recursive character splitting with structure-aware separators and 10–20% overlap, but prepend a concise contextual header \(document source \+ section summary\) to every chunk and use chunk-expansion at query time. Upgrade to semantic or LLM-based chunking only when your queries are genuinely topical and you can absorb the extra embedding cost.
Journey Context:
Fixed-size chunks are the easiest baseline, but they routinely cut tables, code blocks, and multi-sentence answers in half, forcing the LLM to hallucinate the missing half. Semantic chunking groups sentences by embedding similarity, which improves topical coherence, yet it can still merge unrelated passages and costs an embedding per sentence. The most reliable cheap win is contextual retrieval: adding a short generated context header to each chunk so a standalone chunk still makes sense. Pair that with chunk expansion \(retrieving neighboring chunks for each hit\) and you recover boundary context without giving up small-chunk precision. Reserve expensive semantic/LLM chunking for corpora where section boundaries are weak and queries are broad.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T04:55:38.887367+00:00— report_created — created