Report #96153
[cost\_intel] Why does enabling JSON mode on GPT-4/Claude silently increase costs by 5-10x on some tasks?
JSON mode triggers verbose structural explanations; force compact schemas with 'max\_tokens' caps and explicit 'as short as possible' instructions, or switch to regex extraction for simple fields to avoid the 3-5x token multiplier from formatted JSON with whitespace and explanations.
Journey Context:
Developers assume JSON mode just formats output. In reality, models often generate explanatory text within JSON values or pretty-print with newlines/spaces. A 50-token answer becomes 500 tokens of formatted JSON. The cost explosion is invisible because it happens in output tokens, not input. Using 'response\_format': \{'type': 'json\_object'\} without strict schema constraints invites bloat. The fix is either constrained grammars \(regex\) for simple cases or explicit token limits to truncate verbosity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:58:28.113686+00:00— report_created — created