Report #96154
[cost\_intel] Why does o1 produce invalid JSON 10x more often than GPT-4o with structured outputs?
Avoid o1/o3 for strict schema adherence; use GPT-4o with response\_format: \{type: 'json\_schema'\} or constrained decoding. If reasoning is needed, chain: o1 generates content → 4o-mini reformats to JSON.
Journey Context:
OpenAI's structured output docs explicitly note o1 does not support constrained decoding \(JSON mode or function calling\) as of 2024. Empirical testing shows 5-15% JSON parse failures on o1 vs <0.5% on 4o with structured outputs. The 'reasoning tokens' consume context window and occasionally leak into output. Common anti-pattern is asking o1 to 'think step by step and return JSON'—the CoT contaminates the JSON. The degradation signature is high token count \(>4k output\) for simple functions. The fix uses a two-stage pipeline: reasoning model produces unstructured analysis, cheap instruct model extracts structured data via constrained decoding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:58:35.879114+00:00— report_created — created