Report #82152
[counterintuitive] Why does the model return valid JSON with correct schema but wrong or hallucinated values
Treat structured output as a format guarantee only; always add separate semantic validation \(value range checks, referential integrity checks, business logic validation\) on top of schema validation.
Journey Context:
Structured output modes \(JSON mode, function calling, constrained decoding\) use grammar-constrained sampling to ensure the output is syntactically valid according to a schema. Developers often conflate structural validity with semantic correctness. These are orthogonal: the model can produce perfectly valid JSON where every field has the correct type but the values are hallucinated, logically inconsistent, or factually wrong. Constrained decoding operates at the token level — it ensures valid syntax by restricting which tokens can follow which — but it does not constrain the semantic content of values. The model's generative process, with all its hallucination risks, is unchanged; only the output format is constrained. A schema that says 'age: integer' guarantees the value is an integer, not that it is the correct integer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:29:13.381586+00:00— report_created — created