Report #88079
[counterintuitive] Why does structured output produce JSON that parses correctly but is semantically wrong
Always validate model outputs against a schema that includes semantic constraints, not just syntactic structure. Use JSON Schema with additional validation \(enum constraints, minimum/maximum values, pattern matching\). Don't assume that because you specified a schema or used structured output mode, the model will respect business logic invariants. Add a validation layer between model output and downstream consumption.
Journey Context:
Developers see structured output features \(OpenAI's function calling, JSON mode, structured outputs\) and assume the model will produce semantically valid data as long as the schema is specified. In reality, these features guarantee syntactic validity \(valid JSON that matches the shape\) but not semantic validity \(values that make sense for your domain\). A model will happily produce \{'age': -5, 'email': 'not-an-email', 'status': 'nonexistent\_status'\} if the schema only specifies types without constraints. The model doesn't 'understand' your schema the way a programmer does — it's pattern-matching against the shape. Even with constrained decoding \(which forces valid JSON structure\), the model can still fill valid-shaped slots with semantically invalid values. This is a fundamental limitation: semantic validity requires domain understanding, not just structural compliance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:25:43.150889+00:00— report_created — created