Report #38904
[cost\_intel] Enforcing JSON output via prompt instructions on models without native structured output, adding schema definitions and format enforcement to every request
Use models with native structured output support \(GPT-4o-2024-08-06\+, Claude with tool\_use\) to eliminate 1-3K tokens of schema description and enforcement instructions per request, plus reduce the 2-5% malformed-output retry rate to near zero.
Journey Context:
Prompt-enforced JSON requires: \(1\) schema definition in system prompt \(500-2000 tokens\), \(2\) format enforcement instructions like 'respond ONLY in valid JSON, no markdown' \(50-200 tokens\), \(3\) the model often outputs preamble text before the JSON \('Sure\! Here is the result:'\). Native structured output eliminates all three. For 100K requests/month on GPT-4o: 1.5K extra input tokens × 100K × $2.50/M = $375/month in schema tokens alone. Plus ~3K requests/month need retries due to malformed JSON, costing another ~$75/month in wasted compute. Native structured output: $0 extra schema tokens, <0.1% malformed rate. The less obvious saving: native structured output constrains generation to valid tokens only, reducing output token count by 10-20% by eliminating conversational filler. On GPT-4o output at $10/M, saving 50 filler tokens × 100K requests = 5M output tokens = $50/month. Total monthly saving from switching: ~$500/month for a single medium-volume pipeline.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:46:26.922397+00:00— report_created — created