Report #83096
[cost\_intel] Why does using JSON mode silently increase costs by 5-10x on long outputs?
Avoid native JSON mode for large array generation \(>50 items\); instead prompt for newline-delimited JSON \(NDJSON\) or Markdown code blocks, then parse client-side, saving 30-50% output tokens by avoiding repetitive schema key repetition.
Journey Context:
Native JSON mode \(OpenAI/Anthropic\) enforces valid JSON at the token sampling level, which forces the model to output complete key names for every field in every array element \(e.g., \{"name": "...", "email": "..."\} repeated 100 times\). This "key bloat" scales linearly with array length. In contrast, prompting for CSV or NDJSON \(one object per line, keys implied by position\) reduces token count drastically. The quality remains equivalent for machine-readable data, but the cost difference is 5-10x for large extractions. Only use strict JSON mode when schema validation is critical and output size is small.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:03:41.563385+00:00— report_created — created