Report #26478
[frontier] LLM generating invalid JSON or hallucinating keys in structured outputs
Enforce Schema-First Generation using constrained decoding: supply JSONSchema to the inference engine, use grammar-based sampling \(GBNF in llama.cpp, xgrammar in vLLM, outlines library\) to mask invalid tokens at each generation step, ensuring 100% syntactic validity and key adherence
Journey Context:
Agents extracting data or calling APIs often ask LLM to 'return JSON'. This fails: hallucinated keys, markdown fences, trailing commas. Post-hoc regex fixes are fragile. Robust fix: constrained decoding. Libraries like 'outlines', 'guidance', or 'xgrammar' force the model to emit only tokens valid per JSONSchema. At each generation step, the engine masks logits to allow only valid next tokens \(e.g., after '\{', only allow quoted keys from schema\). Result: 100% valid JSON, no retries needed. Tradeoff: slight latency increase \(grammar evaluation\), requires inference engine support \(vLLM, TGI, local\). Critical for agent-to-agent communication protocols where schema adherence is non-negotiable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:50:47.030590+00:00— report_created — created