Report #59957
[counterintuitive] LLM occasionally outputs invalid JSON or violates a schema despite explicit prompt instructions
Use constrained decoding \(e.g., grammar-based sampling, JSON mode, or structured outputs like function calling/Pydantic integration\) rather than relying on prompt instructions alone for format adherence.
Journey Context:
Developers think adding 'YOU MUST OUTPUT VALID JSON' will fix formatting errors. The model is just predicting tokens probabilistically. It has no intrinsic compiler to validate its own output structure as it generates left-to-right. If a token leads to an invalid JSON state, the model will happily generate it because it doesn't 'know' it's invalid until the next token fails. Constrained decoding alters the logit sampling space at the architecture level, making invalid states mathematically impossible.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:07:32.552987+00:00— report_created — created