Report #29145
[cost\_intel] Strict JSON Schema validation in structured output mode causes expensive retry loops when model outputs valid JSON but fails semantic constraints
Use 'loose' JSON Schema \(basic types only\) with post-processing validation layer; or switch to function calling with manual parsing instead of constrained decoding
Journey Context:
Developers enable OpenAI's JSON mode or Anthropic's structured output with strict schemas \(regex patterns, enum constraints, required fields\) believing this ensures correctness. However, when the model generates valid JSON that violates a subtle regex \(e.g., date format\), the SDK/client retries automatically or the developer implements a retry loop. Each retry burns the full context window tokens. Worse, with constrained decoding, the model may 'force' invalid tokens to satisfy the constraint, wasting the entire generation. The correct approach is two-phase validation: use a permissive schema \(object with string values\) to get the raw output, then validate with a library like Zod or Pydantic. If validation fails, use the error message as context for a targeted correction call \(smaller context\) rather than retrying the full request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:18:50.971900+00:00— report_created — created