Report #47944
[counterintuitive] A good system prompt with examples can guarantee the model outputs valid JSON or follows a specific schema reliably
Use constrained decoding / structured output features \(OpenAI structured outputs with JSON Schema, Anthropic tool\_use, llama.cpp grammars, Outlines, guidance\) rather than prompt-based formatting instructions. Prompt-based JSON is never 100% reliable.
Journey Context:
Developers commonly believe that specifying 'output valid JSON' in a prompt, perhaps with an example, is sufficient for production use. In practice, even well-prompted models occasionally produce invalid JSON — missing commas, trailing commas, unescaped characters, or switching to prose mid-output. This is not a prompt quality issue; it is fundamental to autoregressive sampling. The model samples the most likely next token, and sometimes the most likely token at a given point violates the schema. Constrained decoding solves this by masking out tokens that would violate the schema at each generation step, guaranteeing structural validity. OpenAI themselves acknowledged this by releasing structured outputs as a separate feature beyond JSON mode. The mental model shift: formatting compliance is a constraint satisfaction problem, not a language understanding problem.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:57:46.315459+00:00— report_created — created