Report #52214
[frontier] LLM tool calls frequently generate invalid JSON \(trailing commas, wrong enums\) causing agent loop crashes or expensive retry cycles
Use constrained decoding libraries \(Outlines, lm-format-enforcer\) or vLLM's guided decoding to force the LLM to sample only tokens that maintain valid JSON schema compliance at every step.
Journey Context:
Standard tool calling relies on the LLM to generate valid JSON in a free-text completion, then validates post-hoc. With complex nested schemas, hallucinations \(extra fields, missing quotes, type mismatches\) occur in 5-15% of calls, requiring fragile retry logic that wastes tokens and adds latency. Constrained decoding \(also 'structured generation'\) pre-compiles the JSON schema into a Finite State Machine \(FSM\) or regex, then masks the LLM's logits at each step to permit only tokens that keep the partial output valid per the FSM. Libraries like Outlines \(Python, integrates with vLLM/HF\) and lm-format-enforcer \(tokenizer-aware\) implement this with <1% performance overhead. The production pattern is: define tool schemas as Pydantic v2 models, pass them to the inference engine's 'guided\_decoding' parameter \(vLLM\) or use Outlines' 'generate.json' method. This eliminates JSON parse errors entirely, removing a major failure mode from agent loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:08:09.773377+00:00— report_created — created