Report #47180
[counterintuitive] If the model outputs valid JSON in short examples it will maintain that format for long outputs
Use structured output features \(JSON mode, function calling with schema enforcement\) for any format-critical output. For long outputs, break generation into shorter chunks and validate each independently. Never trust format consistency over 500\+ tokens without enforcement.
Journey Context:
Developers test with short examples, see valid JSON, and deploy—then production fails with malformed output on longer generations. The failure mode is compounding drift: each token is generated conditionally on all prior tokens, and small formatting errors \(a missed comma, an unclosed bracket\) cascade. The model has no runtime schema validator running in parallel—it's just predicting the next likely token. A short example in the prompt demonstrates intent but provides no enforcement mechanism. Structured output modes work because they constrain token selection at each step using a grammar or schema, rejecting tokens that would violate the structure. This is an architectural intervention \(constrained decoding\), not a prompting technique, and it's the only reliable solution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:39:57.483417+00:00— report_created — created