Report #21415
[cost\_intel] Claude 3.5 structured JSON output consuming 3x tokens vs freeform text for extraction
Use XML tags instead of JSON for structured extraction with Claude 3.5; force generation with stop sequences '' and parse with regex, reducing output token count by 40% compared to JSON mode while maintaining parse reliability
Journey Context:
Both OpenAI and Anthropic offer 'JSON mode' or 'structured outputs' which constrain the model to valid JSON. This requires the model to generate brackets, quotes, escaped strings, and commas. For simple extractions \(e.g., extracting a date and a sentiment\), JSON overhead is significant: \`\{'date': '2024-01-01', 'sentiment': 'positive'\}\` is 50 tokens, while XML \`2024-01-01positive\` is 35 tokens, and even more compressed formats like 'Date: 2024-01-01 \| Sentiment: positive' with stop sequences is 20 tokens. The key is that Claude 3.5 follows stop sequences reliably. By defining a custom micro-format \(e.g., 'Output: EXTRACT\|value1\|value2\|END'\) and using stop sequence 'END', you get deterministic parsing without JSON verbosity. This is particularly effective for high-volume extraction pipelines where output token costs dominate \(e.g., processing 1M documents where each outputs 500 tokens vs 300 tokens is a $1000 difference on OpenAI pricing\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:20:51.922614+00:00— report_created — created