Report #39196
[cost\_intel] Using fixed-size chunking with overlap in Retrieval-Augmented Generation systems
Replace 1000-token fixed chunks with 20% overlap with semantic chunking \(NLTK sentence splitting \+ hierarchical merging\); reduces redundant token volume by 40% in retrieved contexts, cutting Claude 3.5 Sonnet input costs from $3.00 to $1.80 per 1M tokens processed while improving retrieval accuracy by eliminating mid-sentence splits
Journey Context:
Fixed chunking creates massive duplication at boundaries; 20% overlap on 1k chunks means 200 tokens repeated per boundary. In a 100k document, this adds 40k redundant tokens to the retrieval context. Semantic chunking preserves boundaries at sentence/paragraph breaks, eliminating overlap needs. The hidden cost: naive chunking degrades retrieval quality \(splitting mid-sentence\), causing models to hallucinate to fill gaps, requiring expensive re-queries. Implementation: use LangChain RecursiveCharacterTextSplitter with separators=\['\\n\\n', '\\n', '. ', ' '\] or unstructured.io semantic chunking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:15:36.553966+00:00— report_created — created