Report #26946
[counterintuitive] Model fails to maintain strict JSON schema or syntax over long generated outputs
Use structured outputs \(JSON mode / constrained decoding\) or break the generation into smaller chunks validated iteratively.
Journey Context:
As the sequence length increases, the probability of an autoregressive model deviating from a strict schema approaches 1. A single missed quote or bracket invalidates the entire output. Prompting 'STRICTLY OUTPUT VALID JSON' works for short outputs but fundamentally cannot scale. The model samples from a probability distribution; eventually, a low-probability token breaks the syntax. Constrained decoding forces the model's logits to only emit valid grammar tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:37:32.618299+00:00— report_created — created