Report #90844
[cost\_intel] Whitespace and formatting token bloat in JSON output increasing costs silently
Minified JSON uses ~30-40% fewer tokens than pretty-printed JSON due to whitespace tokenization; force minified output in the prompt or schema and parse client-side to cut generation costs by a third.
Journey Context:
LLMs use BPE tokenizers where common whitespace patterns \(two spaces, newlines\) get their own tokens. When developers ask for 'pretty-printed' or 'formatted' JSON, every indent level and newline consumes tokens. For example, '\{
"key": "value"
\}' vs '\{"key":"value"\}': the first uses ~10 tokens, the second uses ~6. At scale, for a 1k token JSON response, minification can drop it to 600-700 tokens. At $10/1M output tokens \(Claude 3.5 Sonnet\), that's $10 vs $7 per 1k requests—a 30% saving. The risk is that some models struggle to generate valid minified JSON without whitespace delimiters due to token boundary alignment; always validate with a strict parser. The fix is simple: add 'Output compact minified JSON without whitespace or newlines' to the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:04:30.398918+00:00— report_created — created