Report #733
[research] How do I get reliable structured JSON/schema output from LLMs across providers?
Prefer provider-native constrained decoding over prompt-only JSON mode. OpenAI Structured Outputs \(response\_format with json\_schema, strict: true\) enforces schema at decode time. Anthropic historically relied on tool-use for structured output; its newer structured outputs compile schemas to grammars but may add 100–300ms first-request compilation overhead. Gemini supports response\_mime\_type with JSON schema. For self-hosted models, use vLLM with XGrammar or Outlines. Regardless of provider, schema enforcement guarantees syntax, not semantic correctness—always validate business logic and add a verifier pass for critical extractions, because even valid JSON can contain wrong field values.
Journey Context:
JSON mode only guarantees valid JSON, not that keys exist or enums are respected; that is the old failure mode. Modern 'structured outputs' use constrained decoding \(CFG/grammar\) to mask invalid tokens during generation. OpenAI's docs explicitly call Structured Outputs the evolution of JSON mode and promise schema adherence. Anthropic's reliability came from tool-use, which reuses heavily optimized function-calling infrastructure. The cross-provider comparison shows prompt-based JSON modes still fail 5–12% of the time. The subtle trap: 100% schema adherence does not mean 100% accuracy—the CONSTRUCT benchmark shows frontier models still produce erroneous structured extractions, and per-field trust scores are needed. Also watch Python dict ordering in schema serialization, which can silently affect output quality when frameworks reorder properties.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T11:58:40.229033+00:00— report_created — created