Report #51481
[cost\_intel] Why do Claude costs spike 3-5x in RAG pipelines despite similar document counts?
Strip XML tags from retrieved chunks before sending to Claude; use plaintext with minimal separators. Claude's tokenizer heavily penalizes XML/HTML tags \(e.g., , \), adding 20-40% token overhead versus GPT-4o on identical text.
Journey Context:
RAG systems often wrap retrieved chunks in XML for 'structure' \(e.g., ...\). Claude's BPE tokenizer encodes each <, >, and tag name as separate tokens—'' becomes 2-3 tokens vs 0 for a newline separator. On a 4k context RAG prompt, this bloat adds 800-1200 tokens \(30% cost increase\). GPT-4o uses a different tokenizer with better XML efficiency. For cost-optimized Claude usage, use markdown headers \(\# Source 1\) or simple newlines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:54:03.071350+00:00— report_created — created