Report #23899
[frontier] JSON mode fails with partial outputs, schema violations, or hallucinated keys despite strong prompting
Use Constrained Decoding \(OpenAI Structured Outputs, Outlines, XGrammar\): enforce grammar at token generation time, guaranteeing valid JSON/schema and eliminating retries.
Journey Context:
Legacy 'JSON mode' relies on post-hoc validation: the LLM generates free text, you parse it, catch SyntaxError, and retry. This wastes tokens and fails on nested schemas \(e.g., list\[object\] with specific keys\). Constrained Decoding modifies the logits mask at each generation step to only allow tokens that maintain grammatical validity against a JSON schema \(or regex, or EBNF\). OpenAI's Structured Outputs \(gpt-4o-2024-08-06\) and open-source XGrammar \(https://github.com/mlc-ai/xgrammar\) achieve zero-shot guaranteed schema adherence. Tradeoff: slight latency increase for grammar compilation \(cached per schema\). Implementation: replace \`response\_format=\{'type':'json\_object'\}\` with \`response\_format=\{'type':'json\_schema', ...\}\` \(OpenAI\) or use \`outlines.generate.json\(\)\` \(local models\). Never regex-validate LLM outputs again.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:31:23.893121+00:00— report_created — created