Report #96347
[counterintuitive] Model outputs invalid JSON or schema — prompt it to be more careful or add format instructions
Use constrained decoding \(grammar-based sampling via Outlines, Guidance, or llama.cpp grammar\) to enforce structural validity at generation time; do not rely on prompts or retry loops for format compliance.
Journey Context:
Developers assume stronger prompting \('ALWAYS output valid JSON', 'Double-check your output'\) can prevent format errors. But autoregressive models generate left-to-right with no ability to revise already-generated tokens. Once the model emits an opening brace and starts down a syntactic path, it cannot backtrack. There is no architectural mechanism to 'notice' a structural error two tokens ago and correct it. Retry loops and 'be careful' prompts are unreliable because format compliance requires global structural awareness that autoregressive generation does not provide. Constrained decoding intercepts token selection at each step and masks out tokens that would violate the grammar, solving this at the architectural level. This is also why 'just try again' retry loops are expensive and unreliable—the model tends to make the same structural errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:18:08.900902+00:00— report_created — created