Report #48000
[cost\_intel] Using JSON mode/structured outputs for simple responses, increasing token count 30-50%
Use delimiter-based parsing \(e.g., 'Answer: \|\|\|content\|\|\|'\) for simple structured data instead of JSON schema overhead
Journey Context:
JSON mode requires schema tokens in every response: quotes, colons, braces, and often replicated keys. For a simple binary classification \(positive/negative\), JSON uses 20-30 output tokens \(\{'sentiment': 'positive', 'confidence': 0.9\}\) versus 1 token for raw text \('positive'\). At GPT-4 scale \(1M classifications\), that's $18-27 vs $0.60. Only use JSON when: \(1\) schema requires nesting/objects, \(2\) consuming via strict typed parsers that crash on malformed output, or \(3\) using function calling. For internal pipelines where you control the parser, delimiter-based extraction is 10-30x cheaper. Always measure output token count in cost models—JSON overhead is silent budget killer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:02:57.851979+00:00— report_created — created