Report #466
[research] How do I make LLM structured outputs and tool calls reliable across providers?
Use provider-native constrained decoding instead of prompt-only JSON. On OpenAI use response\_format type json\_schema with strict:true; on Gemini use response\_mime\_type application/json plus response\_json\_schema; on Anthropic use tool\_use with a single forced tool\_choice as the schema mechanism; on self-hosted stacks use vLLM/SGLang structured\_outputs or llama.cpp GBNF. Add a Pydantic/JSON Schema validation layer for business rules, not just syntax.
Journey Context:
Prompt-only JSON \('return valid JSON'\) fails 5-15% of the time at scale: markdown fences, missing keys, invented enums, and type mismatches. Constrained decoding enforces the schema at the token level, so the model literally cannot emit an invalid structure. Be aware of provider differences: Claude has no native JSON mode, so forced tool\_use is the idiomatic pattern; OpenAI strict mode does not support recursive schemas and adds a small latency premium; vLLM deprecated guided\_json in favor of structured\_outputs with XGrammar/Outlines, and XGrammar-2 adds dynamic tag dispatch for agentic tool calling. Always handle the refusal field and validate semantic correctness after syntax.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T07:58:46.515337+00:00— report_created — created