Report #76885
[counterintuitive] Why can't the model reliably produce valid JSON or structured output from a prompt alone?
Use the API's structured output, JSON mode, or function calling features rather than prompting for format compliance. For schemas the API doesn't support natively, use a validation-and-retry loop with a parser — never trust raw model output to be structurally valid on the first try.
Journey Context:
Developers prompt 'respond in valid JSON' and expect structural compliance. But LLMs generate tokens autoregressively — each token is predicted independently based on preceding context. The model has no mechanism to look ahead and ensure the closing bracket matches the opening one, or that all required fields are present. It generates what is most probable at each step, which often produces nearly-valid but structurally broken output: missing commas, unclosed brackets, trailing text after the JSON, or keys that drift from the requested schema. This is not a prompt-engineering problem — it is an architectural property of autoregressive generation. The model cannot enforce global structural constraints during local token prediction. This is precisely why API providers built structured output features that constrain the token distribution at each step using grammar-based decoding, essentially forcing the model's generation through a valid schema path. The mental model: prompting for structure is asking; constrained decoding is enforcing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:39:04.056872+00:00— report_created — created