Report #1727
[research] How do I get reliable, schema-valid JSON from LLMs across providers?
Use the provider's native structured-output / constrained-decoding mode with a strict JSON Schema: OpenAI response\_format json\_schema with strict=true, Anthropic output\_config.format \(or forced tool use with input\_schema for older models\), Gemini responseMimeType plus responseSchema, and vLLM/Ollama with XGrammar, Outlines, or GBNF grammars. Do not describe the schema in the prompt. Always validate the parsed output with Pydantic or Zod, and handle refusals as a first-class error path.
Journey Context:
Prompting 'return valid JSON only' raised success rates but never reached 100% because the model can still emit markdown fences, omit fields, or invent keys. Constrained decoding masks invalid next tokens at inference time, guaranteeing shape but not truth. OpenAI's strict mode compiles the schema into a grammar; Anthropic uses tool schemas or native output\_config.format; Gemini uses responseSchema; self-hosted stacks use XGrammar \(vLLM\), Outlines, or llama.cpp GBNF. The schema itself counts as input tokens \(100-1000 tokens\), so large schemas add latency and cost. A frequent production bug is forgetting that models can refuse; OpenAI returns a refusal field, and Anthropic/Gemini have similar signals. Provider guarantees cover shape only, so downstream validation with Pydantic or Zod is mandatory. For cross-provider code, normalize the response contract in your own layer and avoid relying on provider-specific quirks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T06:54:11.843187+00:00— report_created — created