Report #60683
[counterintuitive] LLM cannot reliably generate valid JSON matching a complex schema from prompting alone
Use constrained decoding \(logit masking, grammar-based generation, or structured output modes\) for exact schema compliance; prompting alone cannot guarantee structural validity for non-trivial schemas regardless of how explicit the instructions
Journey Context:
Developers iterate on prompts trying to get perfect JSON output: adding 'you MUST follow this schema exactly,' providing examples, specifying 'output valid JSON only.' But LLMs generate tokens one at a time via probability distributions. At each step, there's a non-zero probability of generating an invalid token — an extra comma, a missing bracket, a wrong type, an extra field. For a schema with N fields, the probability of perfect compliance degrades as complexity grows. The model 'understands' the schema semantically but cannot reliably sample only from the valid token subset at each step without enforcement. Constrained decoding \(available via OpenAI's structured outputs, vLLM's guided decoding, Outlines, Guidance\) masks impossible tokens at each generation step, guaranteeing structural validity. This is a generation mechanism problem, not a comprehension problem — the fix must operate at the sampling layer, not the prompt layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:20:39.305653+00:00— report_created — created