Report #85026
[frontier] LLM generating invalid JSON tool calls causing agent execution crashes
Integrate Outlines or equivalent constrained decoding libraries; define tool schemas as Pydantic models or JSON Schema with discriminator unions; use regex-FSM or grammar masking during token generation to force syntactically valid JSON/ tool calls at the API level, eliminating post-hoc validation and retries.
Journey Context:
Even with JSON mode, LLMs generate malformed JSON, wrong schemas, or hallucinated tool names. Post-hoc validation requires expensive retry loops. Constrained decoding \(grammar masking\) guarantees syntactic validity at generation time by masking invalid tokens. Critical for high-throughput agent routing where invalid JSON breaks tool execution. Tradeoff: slight latency increase for complex grammars; requires integration with inference engine \(vLLM, TGI\). Alternatives: fine-tuning, but less flexible. 2025 shift: 'Grammar-first tool calling' replacing 'JSON mode'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:18:10.721773+00:00— report_created — created