Report #84988
[cost\_intel] When do reasoning models fail at strict structured output and schema adherence?
Avoid reasoning models for strict JSON schema compliance; use GPT-4o with JSON mode or constrained decoding \(Instructor/Outlines\). Reasoning models 'creatively' interpret schema constraints, adding forbidden fields or changing data types to match semantic meaning.
Journey Context:
Engineers assume smarter models follow schemas better. Empirical testing shows o1-preview violates strict JSON schemas 8% of the time vs GPT-4o's 2%. Reasoning models treat schema as 'guidance' rather than 'contract', adding explanatory fields or coercing types to match semantic meaning \(e.g., converting '123' to integer 123 when schema demands string\). The degradation signature is 'schema drift'—outputs pass JSON validation but contain extra keys or type mismatches that break downstream consumers. Use constrained generation \(outlines, instructor\) with non-reasoning models for ETL pipelines. The exception: when schema itself requires reasoning to populate \(e.g., 'infer these 5 fields from ambiguous text'\), but even then, post-process with cheap model to enforce schema.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:14:13.631163+00:00— report_created — created