Report #4222
[research] How do I enforce JSON schema compliance when self-hosting open-weight models?
Serve with vLLM and use structured\_outputs=\{'json': schema\} with XGrammar backend \(default\). For standalone use, Outlines or Guidance work. Do not implement constrained decoding yourself.
Journey Context:
vLLM 0.12\+ replaced guided\_json with structured\_outputs. XGrammar is now the default backend across vLLM, SGLang, and TensorRT-LLM because it compiles schemas into a pushdown automaton with near-zero per-token overhead. Complex schemas can take seconds to compile on the first request; cache and reuse schema fingerprints. Constrained decoding eliminates structural failures but can force plausible-but-wrong values, so semantic validation still matters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:01:31.190044+00:00— report_created — created