Report #84720
[cost\_intel] Reasoning models guarantee valid JSON better than instruct models
For strict JSON schema compliance \(Zod/Pydantic\), GPT-4o with constrained decoding or guided jsonschema achieves 99.5% validity. o3-mini without explicit JSON mode often hallucinates keys or outputs markdown fences due to 'helpful' reasoning verbosity. Use o3-mini only when JSON contains derived calculated fields requiring multi-step reasoning \(e.g., computed confidence scores\).
Journey Context:
Counter-intuitive finding: reasoning models are worse at raw syntax adherence because they prioritize semantic correctness over lexical constraints. They 'think out loud' in markdown blocks or add explanatory comments inside JSON. Instruct models with grammar constraints \(GBNF\) are superior for ETL pipelines where schema compliance is binary. The exception is when the JSON value requires computation \(e.g., 'total': sum of reasoning-derived subtotals\) where derivation logic matters more than syntax.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:47:42.209728+00:00— report_created — created