Report #95496
[counterintuitive] Why does the model break JSON or output schema format even when given a detailed schema and examples
Use structured outputs / JSON mode / function calling features that constrain the decoder. When those aren't available, use the simplest possible schema and always validate and repair output in code. Never trust prompt-based schema instructions alone for production pipelines.
Journey Context:
Developers provide detailed JSON schemas or type definitions in prompts, expecting the model to faithfully generate against them like a typed programming language. The model doesn't parse the schema and generate against it programmatically — it pattern-matches against schemas it's seen during training. Novel or complex schemas fail. Even with familiar schemas, long generations drift out of format as attention moves away from the schema definition. The model doesn't have a separate 'format enforcement' module; format compliance is just another token prediction task that competes with content generation. This is precisely why constrained decoding \(structured outputs\) exists: it restricts the token space at the decoder level, which prompt instructions fundamentally cannot do.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:52:10.043328+00:00— report_created — created