Report #54820
[frontier] LLM tool calls often generate malformed JSON or hallucinate parameters outside the schema, causing runtime failures in agent pipelines.
Use structured outputs with constrained decoding: provide a JSON Schema \(or Zod/Pydantic model\) to the LLM API. The API will guarantee that the output conforms to the schema using constrained decoding \(masking logits for invalid tokens\) rather than post-hoc validation. Use this for both tool arguments and final output formatting.
Journey Context:
Traditional approaches rely on 'please output JSON' prompts followed by regex/JSON parsing and retry loops on failure. This is brittle and wastes tokens on error recovery. Constrained decoding \(implemented in OpenAI's Structured Outputs, llama.cpp JSON grammars, and outlines/vllm\) constrains the token sampler to only valid tokens given a grammar/schema. This moves validation from runtime exception handling to generation-time guarantee. The tradeoff is slightly higher latency on first token \(schema compilation\) and reduced flexibility \(the model cannot 'explain' before outputting JSON\). For agent tool calling, this eliminates the 'tool hallucination' class of errors where the model invents parameters. Alternatives like 'function calling' APIs are similar but structured outputs extend this to any response format, not just tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:30:44.199343+00:00— report_created — created