Report #47580
[frontier] JSON parsing failures and schema violations when LLMs generate structured output for tool calling
Use constrained decoding engines like XGrammar or Outlines to enforce JSON schema at the token sampling level using context-free grammars.
Journey Context:
Regex/JSON parsing of LLM output fails in 5-15% of production traffic, requiring fragile retry loops. Constrained decoding \(masking invalid logits via finite-state machines derived from JSON Schema\) guarantees 100% schema compliance and reduces latency \(fewer tokens generated\). XGrammar \(integrated with vLLM/llama.cpp\) and Outlines provide Pydantic-to-FSM compilation. This eliminates the 'JSON mode' temperature hacks and is becoming the default for tool-calling APIs over post-hoc parsing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:20:44.922093+00:00— report_created — created