Report #96720
[cost\_intel] Pretty-printed JSON causing silent 40% cost inflation in structured output pipelines
Enforce compact JSON \(no whitespace\) in JSON mode/function calling by explicitly prompting 'output compact JSON without whitespace' or using constrained grammars; saves 30-40% on output tokens for structured data extraction compared to pretty-printed defaults
Journey Context:
Models trained on internet text often output 'pretty' JSON with newlines and indentation when asked for JSON. For machine-to-machine communication, whitespace is irrelevant but costly \(1000 tokens of data becomes 1400 with formatting\). Solution: explicitly prompt 'output compact JSON without whitespace' or use response\_format parameters that enforce minimal tokens. Some APIs \(OpenAI JSON mode\) still allow whitespace; client must strip or use constrained grammars \(GBNF\). Significant at scale: 40% token reduction = 40% cost savings on output-heavy extraction tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:55:47.936089+00:00— report_created — created