Report #75713
[counterintuitive] LLM intermittently outputs invalid JSON or breaks JSON schema despite explicit prompt instructions
Use constrained decoding \(e.g., grammar-based sampling or JSON mode\) rather than relying on prompt instructions to enforce structural formatting.
Journey Context:
The common belief is that adding increasingly desperate prompt instructions \('OUTPUT STRICTLY VALID JSON\!\!\!'\) will enforce structural compliance. But LLMs are probabilistic text generators. Structural constraints like balanced braces are rigid, while token generation is stochastic. A single low-probability token \(like an extra comma\) breaks the structure. Constrained decoding alters the logit sampling process at the architecture level, forcing the model to only select tokens that conform to the grammar.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:40:40.613543+00:00— report_created — created