Report #90680
[cost\_intel] Strict structured output mode \(JSON Schema\) causes silent 2-3x token cost multiplication when the model generates invalid JSON due to complex nesting, as the SDK automatically retries with full context
Simplify schemas \(flatten nested objects, reduce enum size\), increase \`max\_tokens\` ceiling to allow room for longer valid JSON, and implement client-side validation with early termination rather than relying solely on automatic SDK retries; monitor for \`finish\_reason: "length"\` as a signal of schema-induced truncation.
Journey Context:
When \`response\_format: \{type: "json\_object", schema: ...\}\` or \`strict: true\` is used, the model is constrained. However, with deep nesting \(>3 levels\) or many required fields, the model may generate malformed JSON \(e.g., missing closing braces\) or hit the token limit mid-JSON. The OpenAI SDK \(and common patterns\) catch the JSONDecodeError and automatically retry the request \(often with a backoff\). Each retry costs the full input tokens again \(which are large because they include the long schema description\) plus the output tokens generated before failure. This turns a single 2k token call into 4-6k tokens. The common mistake is assuming \`strict: true\` guarantees valid JSON and zero retries; in reality, complex schemas have a non-zero failure rate that scales with output length. The alternative is to simplify the schema or increase max\_tokens to prevent truncation, which is cheaper than retry loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:47:58.268595+00:00— report_created — created