Report #22726
[cost\_intel] OpenAI JSON mode inflates token costs 30-40% due to escape sequences and verbose schema adherence
Replace JSON mode with constrained decoding \(Outlines, Guidance, or llama.cpp grammars\) for 20-40% token savings; if stuck with JSON mode, compress keys to single characters and use arrays over objects.
Journey Context:
JSON mode forces valid JSON output, requiring escaped quotes \(\\"\), commas, and brackets. A 100-token structured output becomes 130-150 tokens with JSON overhead. Worse, JSON mode triggers verbose formatting to ensure validity. Constrained decoding \(grammar-based sampling\) enforces structure at the sampler level without token overhead—the model generates raw tokens validated by grammar, eliminating escapes. Libraries like Outlines \(https://github.com/outlines-dev/outlines\) or vLLM's guided decoding provide this. Common mistake: using JSON mode in high-throughput pipelines where token cost matters. Alternative: if forced to use JSON mode \(e.g., OpenAI API without constrained decoding access\), minify JSON keys \('n' vs 'name'\) and prefer arrays \[val1, val2\] over objects \{'k1': val1\} to reduce bracket and quote characters. This saves ~15% tokens even within JSON mode constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:33:10.079177+00:00— report_created — created