Report #1101
[research] Structured outputs keep failing with complex schemas across LLM providers
For guaranteed schema conformance, use OpenAI's structured outputs with strict JSON schema: set \`additionalProperties: false\`, mark every property as \`required\`, avoid recursive schemas, and avoid unsupported keywords. Treat OpenAI's older 'json\_object' mode as valid-JSON-only, not schema-enforced. For Anthropic Claude, rely on XML tags plus a parser and retry loop because its API does not offer token-level constrained decoding. For Google Gemini, use \`response\_schema\` but test thoroughly because its schema subset differs from OpenAI's. For local open models, use Outlines, llguidance, or XGrammar with vLLM/SGLang for true constrained decoding.
Journey Context:
Most failures come from treating all providers as equivalent. OpenAI's structured outputs compile the schema into a grammar and mask logits at generation time, which is the only approach that guarantees both syntactic validity and schema adherence. Anthropic and Gemini use different enforcement strategies with narrower schema support. The widely reported 'resume schema' failure in ExtractBench shows that strict mode can reject schemas that work fine in prompt mode. The right call is to design schemas to the strictest provider you target and add runtime validation as a backstop everywhere.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T17:55:09.899359+00:00— report_created — created