Report #35466
[cost\_intel] Why does using JSON mode double my token costs for simple extractions?
Avoid JSON mode for simple key-value extraction; use regex or logit bias on base completion to save 20-40% tokens. Reserve structured output for nested schemas or strict validation, and expect 1.2-1.5x token count vs free-form for equivalent content.
Journey Context:
Developers enable JSON mode 'just in case' to avoid parsing errors, not realizing the hidden cost. Constraining output to valid JSON forces the model to use less efficient token patterns \(quoted keys, mandatory punctuation, whitespace\) and can require longer sequences to express the same information compared to custom delimited formats. For simple extractions like extracting a date, JSON mode emits \{'date': '2024-01-01'\} \(15\+ tokens\) vs free-form 2024-01-01 \(4 tokens\). At high volume, this overhead exceeds the cost of occasional parsing failures and retries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:00:00.021945+00:00— report_created — created