Report #48614
[cost\_intel] Structured output reliability degradation in reasoning models
Avoid o1/o3 for strict JSON Schema enforcement; use GPT-4o with \`response\_format: \{type: "json\_object"\}\`. Reasoning models prioritize CoT over schema compliance.
Journey Context:
o1-preview and o3 often "think" in natural language then attempt to format, but the reasoning chain can leak outside JSON or ignore schema constraints \(e.g., emitting text before JSON\). GPT-4o with JSON mode has >98% schema compliance, while o1 is ~85% in practice \(anecdotal but widely reported\). The cost of schema failure is high: malformed JSON crashes downstream pipelines. The fix is to use 4o for extraction/formatting tasks, even if the content is complex. If reasoning is needed, use 4o to extract, then o1 to reason over the extracted text in a second call \(pipeline\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:05:04.125481+00:00— report_created — created