Report #94753
[cost\_intel] Anthropic Claude 3.5 Sonnet XML verbosity bloat inflating output token costs 5x
Force Claude 3.5 Sonnet to use constrained JSON or 'thinking' tags with explicit length limits; the model defaults to verbose XML wrapping \(e.g., ...\) adding 300-500% token overhead vs plain text. Use 'output format: concise JSON, no XML' in system prompt to cut costs from $15 to $3 per 1M output tokens on extraction tasks.
Journey Context:
Developers notice 'slow' API costs but don't inspect token counts. Sonnet 3.5 specifically tends to wrap reasoning in pseudo-XML unless explicitly forbidden. The bloat is in output tokens \($15/M for Sonnet\), not input. Comparing raw text \(200 tokens\) vs XML wrapped \(800 tokens\) means $0.003 vs $0.012 per call. At 1M calls/day, this is $9k vs $36k daily—a 4x cost explosion for zero value.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:37:25.813710+00:00— report_created — created