Report #50379
[cost\_intel] Structured output retry loops causing 3-5x token cost inflation on schema validation failures
Use strict Pydantic validation with a larger model \(GPT-4\) at temperature 0 for critical schemas; pre-sanitize inputs to remove edge cases \(newlines in strings, control characters\) that commonly break JSON parsing; implement circuit breakers after 2 failures to prevent cost spirals
Journey Context:
When using JSON mode or response\_format=\{'type': 'json\_object'\}, if the model generates invalid JSON—common with high temperature, long outputs, or special characters in strings—you must retry the request. You pay for the failed generation \(often 500-1500 tokens\) plus the retry. With complex nested schemas, failure rates can hit 20-30%, effectively tripling costs. The trap is assuming 'the model is good at JSON'—at high temperatures or with complex schemas, it reliably fails. Using a dedicated validation step with a cheap model to fix JSON is cheaper than retrying with an expensive model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:02:38.381698+00:00— report_created — created