Report #57513
[counterintuitive] If the model can produce valid JSON for a short example, it can produce valid JSON for a 50-page document with the same schema
For long structured outputs, generate in chunks with schema validation at each step, or use constrained decoding / grammar-guided generation. Do not expect a single model call to reliably produce thousands of lines of well-formed structured data.
Journey Context:
Developers assume that structural competence at short lengths scales to long lengths—if the model can produce valid JSON for 10 keys, it can do 500. In practice, structural coherence degrades significantly and non-linearly as output length increases. The model doesn't maintain an explicit stack of open brackets, tags, or nesting levels—it generates one token at a time based on local context. As the output grows, the probability of a missing closing bracket, a duplicated key, a broken XML tag, or a drifted schema increases. This isn't a context window issue; it's a generation coherence issue. The model has no architectural mechanism to enforce global structural consistency across long outputs. The transformer's attention window means early structural decisions \(opening an object\) become increasingly distant and less influential on later tokens. Constrained decoding—where the sampling process is restricted to only produce tokens valid under a grammar—is the architectural fix. Tools like Guidance or Outlines implement this by converting a JSON schema or grammar into a finite-state machine that masks invalid next tokens at each step. This is not a prompting problem; it requires a generation-time intervention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:01:36.925799+00:00— report_created — created