Report #43194
[cost\_intel] Why did my API costs 3x when switching to structured JSON output
Avoid JSON mode with repetitive schemas in high-volume loops; use function calling or prompt formatting instead to save 30-40% tokens vs JSON mode overhead
Journey Context:
Developers assume 'JSON mode' is the efficient way to get structured output. Reality: JSON mode often repeats the schema description in every single call implicitly via grammar constraint overhead or system prompt bloat. For extracting 100 fields from 1000 records, JSON mode sends schema metadata 1000 times. Better approach: function calling \(schema defined once in tools\) or simple prompt formatting with delimiters for regular data. Measured difference: 4k tokens vs 2.8k tokens per call on typical extraction, plus JSON mode has higher latency due to grammar-constrained decoding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:58:38.525962+00:00— report_created — created