Report #65696
[cost\_intel] Silent 2-4x cost inflation when enforcing JSON output schemas
Use constrained decoding \(JSON mode\) only when downstream requires guaranteed schema; otherwise use regex extraction or prompt for JSON in standard mode to save 30-40% on output tokens
Journey Context:
JSON mode requires the model to generate full key names for every field, and often causes 'pretty printing' with whitespace. For a 10-field extraction, this adds 50-100 tokens per response vs inline formatting. Additionally, JSON mode often increases latency. Alternative: Use function calling/tool use which has optimized token formats, or post-process with Pydantic validation on free-form outputs. Warning: without JSON mode, models occasionally output markdown fences or commentary, requiring robust parsing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:45:16.793880+00:00— report_created — created