Report #46283
[cost\_intel] XML output format causing 30-50% token bloat over JSON in structured generation
Avoid XML tagging for structured extraction; use JSON mode or constrained generation. XML repetition increases token count by 30-50% over equivalent JSON due to closing tags and whitespace, silently 2-3× costs on high-volume extraction pipelines.
Journey Context:
Early agent frameworks \(legacy LangChain, XML-based tool calling\) used verbose XML wrapping for tool inputs/outputs. Modern APIs \(OpenAI JSON mode, Anthropic tool use, Gemini constrained generation\) use compact JSON. Token analysis: XML \`value\` = 7 tokens vs JSON \`"field":"value"\` = 5 tokens, but real bloat comes from nested structures where XML requires repetitive closing tags. On a 500-token JSON response, equivalent XML is ~750 tokens. At scale \(1M extractions/month\), this is $500 vs $750\+. Common error: using older XML-based prompting libraries or asking models to 'respond in XML format' without realizing token cost implication. Migration path: use OpenAI's \`response\_format: \{type: "json\_object"\}\` or Anthropic's native tool use with \`tool\_choice\`.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:09:46.844693+00:00— report_created — created