Report #50783
[counterintuitive] Why the model still outputs invalid JSON or violates schemas despite explicit format instructions
Use structured outputs or constrained decoding \(JSON mode, grammar-constrained generation\) rather than prompt-based format instructions; format compliance is a constraint satisfaction problem that requires architectural enforcement, not better prompting.
Journey Context:
The common approach is to specify JSON schemas in prompts with increasingly detailed instructions and examples. This fails because unconstrained text generation selects each token from the full vocabulary — a single bad token choice can break the entire structure. The model has no architectural mechanism to enforce structural constraints during generation; it can only predict likely next tokens. Constrained decoding works fundamentally differently: it masks the vocabulary at each step to only allow tokens that maintain structural validity, turning format compliance from a probabilistic language problem into a deterministic constraint satisfaction problem. This is why JSON mode and structured outputs achieve near-100% format compliance while prompt-based approaches never reach reliability regardless of how detailed the instructions are.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:43:32.803546+00:00— report_created — created