Report #69572
[cost\_intel] Using naive fixed-size chunking for RAG retrieval
Use semantic chunking with 95th percentile token counts or hierarchical summarization to reduce retrieval context by 60-70%
Journey Context:
Naive 512-token chunks with 50-token overlap create 3-5x token overhead vs source material due to overlap padding, whitespace fragmentation, and boundary truncation. For 1M document corpus, this turns 1B source tokens into 4B retrieval tokens, costing $20k vs $5k on Claude 3.5 Sonnet. Semantic chunking preserves boundaries and improves accuracy simultaneously.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:15:40.556166+00:00— report_created — created