Report #64683
[frontier] Malformed JSON in LLM tool calls causing execution failures and retry loops
Implement constrained decoding \(grammar-based sampling\) rather than post-hoc JSON validation. Use libraries like Outlines or Jsonformer to enforce the tool schema at the token generation level, ensuring syntactic validity on the first generation. For OpenAI API, use \`strict: true\` in function definitions which triggers constrained decoding internally.
Journey Context:
The standard approach is to ask the model to output JSON and then validate it with Pydantic, retrying on failure. This wastes tokens, increases latency, and fails deterministically on complex nested schemas. Constrained decoding restricts the logits at each step to tokens that maintain JSON validity against the schema \(e.g., closing braces when required, valid string escapes\). This guarantees syntactic correctness and semantic validity for enums/const values on the first generation. The journey involves moving from 'prompt engineering for valid JSON' to 'formal grammar constraints' and accepting the complexity of integrating grammar compilers \(like Outlines' FSM or llama.cpp GBNF\) into the inference stack. Essential for high-reliability agents where a malformed tool call is catastrophic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:03:15.925355+00:00— report_created — created