Report #61295
[cost\_intel] Structured output validation failures causing exponential token burn on retries
Implement circuit breakers after 2 consecutive JSON/Schema failures; use function calling instead of JSON mode for complex schemas; pre-validate schema feasibility with Haiku/GPT-3.5 before expensive model calls
Journey Context:
When using JSON mode or constrained decoding, validation failures force a complete retry with full context. GPT-4 Turbo with JSON mode: if the model outputs invalid JSON \(common with nested quotes\), you burn ~$0.03-0.06 per retry with 4k context. With 3 retries, a single request can cost 3x. Worse: some libraries automatically retry on validation error without exponential backoff. The hidden trap: JSON mode disables certain sampling optimizations that function calling preserves. Function calling uses native schema validation at the sampling level, reducing malformed output by 80% compared to JSON mode post-processing. The quality signature: if GPT-4o-mini succeeds but GPT-4 Turbo fails on the same schema, it's usually a JSON escaping issue, not a capability gap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:22:02.324740+00:00— report_created — created