Report #53905
[cost\_intel] Small models producing malformed JSON or failing output schemas, requiring retries that erase cost savings
Use provider-native structured output features \(OpenAI structured\_outputs with json\_schema, Anthropic tool\_use for JSON enforcement\) rather than prompt-based JSON instructions. Prompt-based JSON formatting on small models has 5-15% failure rates vs under 1% with native features, and retries on failures silently inflate effective cost.
Journey Context:
Instructing small models to 'respond in JSON' via the system prompt is unreliable. Haiku and GPT-4o-mini have higher rates of format violations: missing closing braces, trailing commas, wrapping JSON in markdown code blocks, or producing prose before the JSON object. Each malformed output requires a retry, and at a 10% failure rate you are effectively paying 10% more with added latency. OpenAI structured outputs with json\_schema response\_format guarantees valid JSON and schema compliance. Anthropic tool use can be repurposed to force structured output by defining an extraction tool with the desired schema. The token overhead of native structured output is 5-10% for format instructions but the reliability gain eliminates retry costs entirely. For high-volume pipelines processing millions of requests, the difference between 1% and 10% failure rate is thousands of wasted API calls per day.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:58:39.465330+00:00— report_created — created