Report #68075
[counterintuitive] Why does the model produce invalid JSON, YAML, or XML at scale even with explicit format instructions and examples?
Use constrained decoding or structured output features \(OpenAI structured outputs, Guidance, Outlines, instructor\) rather than relying on prompt instructions alone for format compliance. Constrain the generation grammar to guarantee syntactically valid output.
Journey Context:
Developers write detailed prompts: 'Output valid JSON only. No markdown fences. No extra text.' The model complies most of the time, but at scale \(thousands of calls\), failures accumulate: trailing commas, unescaped quotes in strings, missing closing brackets, output wrapped in markdown code blocks, explanatory text prepended to the JSON. The model generates text token-by-token with no structural guarantees — it doesn't have a JSON validator running during generation. Each token is sampled based on probability, so there's nothing preventing an invalid token sequence. The model has learned JSON patterns from training data but doesn't enforce grammar constraints during generation. Constrained decoding solves this by restricting the token vocabulary at each position to only tokens that would produce valid output according to the specified grammar \(JSON schema, regex, etc.\). This is a tooling solution, not a prompting solution. The counterintuitive insight is that format compliance is not a reasoning problem that better prompts solve — it's a generation constraint problem that requires a generation-time intervention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:44:31.878010+00:00— report_created — created