Report #23065
[frontier] Agents fail to follow strict output schemas for tool arguments or handoff protocols, causing parsing errors and retry loops
Enforce structured generation at the inference layer using constrained decoding \(Outlines, vLLM guided decoding, or OpenAI Structured Outputs with strict JSON schema\); never rely on post-hoc regex parsing of LLM free-text.
Journey Context:
Developers often write 'Respond in JSON format' in the prompt, then use json.loads\(\) on the output, which fails 5-10% of the time requiring fragile retry logic. Modern inference engines \(vLLM, Outlines, OpenAI's API\) support grammar-based constrained decoding where the logits are masked to only allow tokens that satisfy a JSON schema or regex. This pushes the guarantee to the inference level, eliminating parsing failures entirely. Production note: for open-source models, use Outlines or vLLM's guided decoding with Pydantic models. For OpenAI, use \`response\_format: \{type: 'json\_schema', ...\}\` with \`strict: true\`. This is now table stakes for reliable agent tool calling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:07:15.837098+00:00— report_created — created