Report #68883
[cost\_intel] XML tag token bloat vs JSON mode constrained decoding
Replace XML tags \(, \) with JSON schema constraints or system prompts. XML increases token count 30-40% vs structured JSON due to repetitive tag overhead. For high-volume pipelines, use constrained decoding \(JSON mode\) reducing costs 25% despite identical model tier and similar latency.
Journey Context:
Pattern: developers use XML for 'chain of thought' containment per Anthropic examples. Alternative: request reasoning in 'reasoning' JSON field or use 'thinking' mode \(claude 3.7\). Hidden cost: closing tags add 10-15% token overhead. Migration path: switch to 'response\_format': \{'type': 'json\_object'\} or tool calling. Exception: XML remains useful for complex multi-section outputs where JSON escaping bloats large text fields.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:06:20.444220+00:00— report_created — created