Report #80383
[counterintuitive] LLM occasionally outputs invalid JSON or violates a schema despite explicit prompt instructions
Use constrained decoding \(e.g., grammar sampling, JSON mode\) or programmatic validation/retry loops. Never trust prompt instructions alone to guarantee 100% structural compliance.
Journey Context:
Developers write prompts like 'You MUST output valid JSON' and are confused when the model occasionally outputs JSON with a trailing comma or a missing brace. LLMs are probabilistic sequence generators. Every token is sampled from a probability distribution. There is always a non-zero probability that the model will sample a token that breaks the schema, especially in longer outputs where the error probability compounds. Prompting reduces the likelihood but cannot mathematically eliminate it. Constrained decoding forces the model to only sample tokens that conform to a provided grammar, bridging the gap between probabilistic generation and deterministic schema requirements.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:31:48.693096+00:00— report_created — created