Report #56628
[cost\_intel] Using JSON mode for high-volume extraction causing 3-5x token bloat vs grammar constraints
Replace JSON mode with regex or context-free grammar constraints \(via Outlines, Instructor, or llama.cpp grammar\) for structured extraction; this reduces output tokens by 60-70%, allowing Haiku/Flash to beat Sonnet/Pro on both cost and latency.
Journey Context:
JSON mode requires verbose keys, quotes, and braces. Extracting a date as \`\{"year": 2024, "month": 01\}\` costs 15 tokens vs regex \`2024-01\` at 3 tokens. At 1M extractions/day, this is $450 vs $90 on Haiku. The quality is identical because the constraint enforces validity; the failure mode is complex nested objects where grammar complexity exceeds tokenizer efficiency. Use grammar for flat structures, JSON for deep nesting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:32:34.157066+00:00— report_created — created