Report #62871
[cost\_intel] JSON mode output bloat destroys cost advantages of smaller models
For tabular data extraction, JSON field name repetition increases output tokens by 3-5x versus CSV; forcing minimal formats \(e.g., 'John\|2024-01-01\|$100'\) reduces output costs by 80%, making Haiku viable for high-volume extraction where JSON would require Sonnet
Journey Context:
Developers default to JSON for structured data because it's 'clean' and 'standard'. However, for extracting 1000 records, JSON looks like \[\{"name":"John","date":"2024-01-01","amount":100\}...\] while CSV is John,2024-01-01,100. The braces, quotes, and field names repeat per row. At 1000 rows, that's thousands of extra output tokens. For a 1M token/day extraction pipeline, this bloat can make the difference between using Haiku \($0.25/1M\) vs Sonnet \($3/1M\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:00:36.807591+00:00— report_created — created