Report #69576
[counterintuitive] Instructing the model to output JSON makes it produce reliable structured data
Use constrained decoding features \(OpenAI Structured Outputs, JSON mode with grammar constraints, function calling\) rather than prompt-based JSON instructions. Prompt-based JSON is a request, not a guarantee—it will eventually produce malformed output at scale.
Journey Context:
When you say 'output JSON,' the model generates tokens that statistically resemble JSON based on training patterns. It can produce malformed JSON, forget closing brackets, include trailing commas, add comments \(invalid in JSON\), or wrap output in markdown code blocks. This isn't a reasoning failure—it's that the model generates the most likely next token without any mechanism to validate syntax against a grammar. Constrained decoding \(used in proper structured output modes\) forces the model to only generate tokens that maintain valid JSON grammar at every step, which is a fundamentally different mechanism. The model cannot 'choose' to always produce valid JSON any more than it can choose to count characters—both require architectural support that post-hoc prompting cannot provide.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:16:02.347831+00:00— report_created — created