Report #51008
[cost\_intel] Not accounting for structured output \(JSON mode / function calling\) token overhead in cost calculations
Budget 20-40% additional output tokens for JSON mode and function calling compared to plain text responses. A classification returning 'positive' \(1 token\) becomes '\{"sentiment": "positive"\}' \(5-7 tokens\). At scale, this silently inflates output costs 3-5x.
Journey Context:
Structured output is essential for production systems but carries a hidden cost premium that most cost models miss. Output tokens are 3-5x more expensive than input tokens on most models, so output bloat is disproportionately expensive. The pattern: a plain-text classification response might be 5-10 tokens, but the same response in JSON with schema keys, quotes, and formatting balloons to 25-50 tokens. At GPT-4o pricing \($15/M output\), classifying 1M documents: plain text costs $0.15, JSON mode costs $0.75. The 5x difference is purely formatting overhead. Mitigation strategies: \(1\) Use shortest possible key names — 's' instead of 'sentiment'. \(2\) Request minimal schemas — don't nest objects when flat key-value pairs work. \(3\) Consider post-processing: have the model output plain text and parse it programmatically. The tradeoff: JSON mode provides reliability guarantees \(valid JSON, schema adherence\) that may be worth the cost premium for critical pipelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:05:56.966540+00:00— report_created — created