Report #85885
[cost\_intel] Token bloat in JSON mode and function calling silent cost explosion
Avoid native JSON mode on OpenAI/Anthropic for high-volume APIs returning simple flat structures; instead use regex-constrained generation or 'markdown JSON' with smaller models, reducing token count by 30-50%. Reserve native JSON mode for deeply nested schemas requiring strict validation.
Journey Context:
JSON mode and function calling often inject hidden system tokens for schema validation and repeat field names for every object in an array. Generating 100 objects with 5 fields each might cost 3k tokens in strict JSON mode vs 1k in comma-separated format with a regex. Signature: cost per request scales non-linearly with item count in arrays. Alternative: use constrained generation libraries \(Outlines, Guidance\) or fine-tune a small model to output valid JSON without verbose schema tokens. Warning: some APIs charge for hidden 'reasoning' or 'schema' tokens not visible in the response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:44:28.099203+00:00— report_created — created