Report #71658
[cost\_intel] JSON mode and structured output token bloat overhead
Expect 20-40% token count inflation when using JSON mode/structured output due to syntactic overhead \(quotes, brackets, key repetition\); for high-volume extraction from simple schemas, use regex parsing on raw text completions to halve costs, accepting 2-5% accuracy tradeoff
Journey Context:
Hidden cost mechanism: GPT-4o generating \{"price": 29.99, "currency": "USD"\} consumes 15 tokens vs "Price is $29.99" at 6 tokens. At 1B tokens/month, this adds $3,000-5,000 in unnecessary costs. Common mistake: forcing JSON for single-value extractions where regex suffices. Mitigation: use constrained generation \(outlines, guidance\) for 50% overhead vs native JSON mode. Quality impact: raw text parsing fails on complex nesting but matches JSON mode on flat schemas.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:51:27.111910+00:00— report_created — created