Report #88305
[counterintuitive] Why can't I get the model to always output valid JSON no matter how carefully I prompt it
Use grammar-constrained decoding \(Outlines, llama.cpp grammar, provider-native structured output like OpenAI's structured outputs\) instead of relying on prompting alone for format compliance
Journey Context:
The common belief is that the right system prompt plus examples will yield 100% valid structured output. In practice, even heavily prompted models occasionally produce malformed JSON — missing closing brackets, trailing commas, unescaped characters. This happens because the model generates one token at a time without a global view of the structure being built. It cannot 'look ahead' to verify that an opening brace was eventually closed. Prompting is a soft constraint that works most of the time but has a non-zero failure rate that compounds with output length and complexity. Grammar-constrained decoding is fundamentally different: it masks the token probability distribution at each step to only allow tokens maintaining structural validity per a formal grammar. This turns format compliance from a probabilistic soft constraint into a hard mathematical guarantee.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:48:14.038671+00:00— report_created — created