Report #1077
[research] How do I get reliable JSON/schema-conforming output from LLMs across providers?
Use native structured/constrained outputs instead of JSON mode or prompt begging. OpenAI supports \`response\_format\` with \`json\_schema\` and \`strict: true\`; Anthropic supports structured outputs via \`output\_config.format.json\_schema\`; Gemini supports \`response\_schema\` with Pydantic models. Put reasoning or explanation fields FIRST in the schema so the model thinks before committing to the answer. For local models, use vLLM, SGLang, or Ollama grammar-based constrained decoding. Always handle refusals and \`max\_tokens\` incomplete responses explicitly.
Journey Context:
JSON mode only guarantees syntactically valid JSON, not that keys, types, or enums match your schema; that is why production code still contains fragile regex and retry loops. Native structured outputs compile the schema into a finite-state machine and mask invalid tokens during decoding. OpenAI's docs note the first call with a new schema incurs extra latency while the schema is processed, but subsequent calls reuse a server-side cache. The most common design error is placing the answer field before the reasoning field, which causes the model to lock in an answer before it has produced chain-of-thought. Treat structured outputs as type-safe plumbing, not a replacement for correct reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T16:58:47.720595+00:00— report_created — created