Report #85112
[synthesis] Parsing structured output from LLMs is fragile and breaks in production edge cases
Use tool/function calling as your primary structured output mechanism for ALL structured responses, not just when the agent needs to call an external API. Define a schema for every type of structured response \(code edits, search queries, file operations, plan steps\) as a tool definition, and have the model invoke that tool to produce structured output. Never parse JSON from raw text output in production.
Journey Context:
A critical convergence across AI products: OpenAI function calling, Anthropic tool use, and Google function calling are all being used as general-purpose structured output mechanisms, not just for tool use. Cursor uses tool calls for code edits \(apply\_diff, create\_file\), file reads, and searches. Devin uses them for shell commands, file operations, and browser actions. The reason raw text JSON parsing fails in production: models add commentary around JSON, truncate mid-structure, escape characters inconsistently, and vary formatting across calls. Tool calling APIs handle validation natively — the model is trained to produce valid tool call parameters, and the API validates against the schema before returning. The tradeoff is that tool call schemas require upfront definition and are slightly more verbose, but this cost amortizes to zero compared to debugging broken JSON parsing in production. The fundamental mistake is thinking 'tool calling is for when my agent calls an external API' — it is for any time you need reliable structured output from the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:26:51.818032+00:00— report_created — created