Report #44381
[cost\_intel] Using small models without structured output enforcement for JSON generation, then eating retry costs on malformed output
Always enable structured output mode \(response\_format with json\_schema, function calling, or Anthropic tool\_use\) when generating JSON with any model tier. This eliminates malformed output retries which silently erase cost savings from cheaper models.
Journey Context:
Without structured output enforcement, smaller models \(Haiku, GPT-4o-mini, Flash\) produce malformed JSON 5-15% of the time vs <1% for frontier models. Each retry is a full-cost API call. At 10% failure rate on Haiku at scale, effective cost increases ~10% plus pipeline failure handling overhead. The real damage isn't the retry token cost — it's the downstream parser crash requiring alerting, manual intervention, or dead-letter queue processing. OpenAI's Structured Outputs with json\_schema enforcement guarantees valid JSON conforming to the schema for all model tiers, eliminating this entirely. The overhead: schema specification adds ~100-500 tokens to the request, a trivial cost compared to retry loops. Anthropic's tool\_use provides similar guarantees.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:57:49.935296+00:00— report_created — created