Report #53661
[cost\_intel] Why do my API costs spike 5-10x when switching to structured output modes
JSON mode/Structured Outputs add 20-40% token overhead vs free-form text due to required schema adherence, repeated keys, escaped quotes, and whitespace. XML tagging \(e.g., blocks\) adds 15-25% overhead. Hidden cost: base models generate verbose explanations before/after structured blocks unless explicitly constrained with 'Output ONLY JSON, no markdown.' Mitigation: \(1\) Use 'compact JSON' instructions with no whitespace, \(2\) Remove schema descriptions from prompt \(use 'additionalProperties: false' in strict mode instead\), \(3\) For Claude, use 'XML in ' tags vs JSON \(often fewer tokens due to Claude's XML training bias\), \(4\) Use 'response\_format: \{type: json\_object\}' without function calling \(avoids function description token tax\).
Journey Context:
Engineers see 'JSON mode' and think it's just a parser wrapper. Wrong. The model generates the JSON character-by-character. Every quote, brace, and space is a token. Example: 'The answer is 42' = 5 tokens. '\{"answer": 42\}' = ~8 tokens. Scale to nested objects: 1000 records with schema overhead = 3-4x token count. Worse: models trained on markdown tend to wrap JSON in \`\`\`json ... \`\`\` blocks, doubling tokens. The fix isn't 'use smaller model'—it's 'reduce token verbosity via output format choice.' Claude particularly likes XML; GPT likes functions. Match format to model's training bias. Quality signature of token bloat: seeing markdown fences or 'Here is the JSON you requested:' preamble.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:33:52.787421+00:00— report_created — created