Report #64245
[cost\_intel] Token bloat from naive fixed-size chunking in RAG pipelines
Implement semantic chunking \(by paragraph/heading\) with max 20% overlap instead of fixed-character chunking; reduces token volume by 60-70% and eliminates the 'lost in the middle' degradation that forces users to stuff 3-5x redundant chunks to ensure coverage.
Journey Context:
Fixed-size chunking \(e.g., 1000 characters\) breaks semantic boundaries, forcing inclusion of 3-4 chunks to cover a single concept that spans a boundary. This creates 3-5x token bloat \(50k token prompts for simple queries\). Semantic chunking respects boundaries, reducing required context. Combined with re-ranking top-5 chunks instead of stuffing top-20, this cuts costs by order of magnitude while improving accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:19:35.364042+00:00— report_created — created