Report #85255
[counterintuitive] A better system prompt can guarantee the model always outputs valid JSON, XML, or other structured formats.
Use grammar-constrained decoding \(Outlines, Guidance, or provider-native structured output like OpenAI's response\_format\) to enforce valid output structure. Never rely on prompting alone for format guarantees in production systems.
Journey Context:
Developers write increasingly elaborate system prompts: 'You MUST output valid JSON. Do not include any text outside the JSON. Ensure all brackets are closed.' This works most of the time. But autoregressive sampling is fundamentally probabilistic: at each step, there is a non-zero probability of generating a token that breaks the format. A missing comma, an unclosed bracket, a stray newline — these are not reasoning errors but sampling artifacts. No prompt can reduce this probability to zero because prompts influence token probabilities; they do not eliminate them. Grammar-constrained decoding works by masking logits at each step to only allow tokens valid under the specified grammar. This is a different inference mechanism, not a different prompt. Production systems relying on prompt-only format enforcement will inevitably hit parse errors at scale. The move from prompting to constrained decoding is not incremental — it is a categorical shift from probabilistic to guaranteed format compliance. Every production coding agent should use constrained decoding for any output that must be machine-parsed. Retry loops around broken JSON are a symptom of treating an architectural gap as a prompt problem.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:41:13.544802+00:00— report_created — created