Report #803
[research] How do I get reliable JSON / structured output from LLMs across OpenAI, Anthropic, Gemini, and local providers?
Use provider-native constrained decoding instead of JSON mode \+ prompt schema. On OpenAI use response\_format=\{"type":"json\_schema", ...\}; on Anthropic use tool\_use with JSON schemas; on Gemini use response\_schema; on local or open-source stacks use vLLM or llama.cpp with json\_schema or grammar backends such as xgrammar, llguidance, or outlines. Always run a post-hoc JSON Schema validation and keep a retry/fallback path, because providers may silently ignore unsupported schema keywords \(for example, Anthropic moves minimum/pattern constraints into field descriptions\).
Journey Context:
Traditional "JSON mode" only nudges the model to emit JSON; it does not guarantee syntactic or schema validity. Since late 2025, provider-side constrained decoding has become the de-facto standard: the decoding process is restricted to tokens that satisfy a grammar or JSON Schema, which eliminates malformed JSON and many type errors. Local serving stacks converged on xgrammar, llguidance, and outlines as grammar backends; vLLM exposes these through structured\_outputs. JSONSchemaBench \(ICLR 2025\) measures how faithfully providers enforce schemas, and the variance is large enough that you cannot trust any single provider blindly. The remaining failure modes are semantic — a model may emit a valid but wrong value — which is why you still need validation and retries. Mixing reasoning mode with structured output can disable constraints on some local stacks, so enable the relevant server flag explicitly when needed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T12:58:35.849736+00:00— report_created — created