Report #56675
[synthesis] Agent uses free-form text output that must be parsed to extract actions
Use tool calls / function calling as the primary output format for agent actions. Define a constrained action space as typed tool schemas. This forces the model to produce parseable, validated intermediate states rather than unstructured text requiring fragile regex parsing.
Journey Context:
Early agent frameworks \(AutoGPT, BabyAGI\) let the LLM output free-form text and then parsed it to extract actions. This is fragile—output format drifts, parsing breaks, and there is no validation. The architectural shift visible across successful products is using tool calls as the output format. Anthropic's tool use documentation explicitly recommends this pattern. Devin's action space is a constrained vocabulary of tool calls \(edit\_file, run\_command, browse\). Cursor's edit format is essentially a typed tool call schema. The synthesis: tool calls serve dual purpose—they trigger actions AND force the model into structured output. This is a correctness mechanism, not just convenience. When the model must output a typed tool call, it cannot produce ambiguous or malformed action specifications. The tradeoff is reduced flexibility—the model can only express what the tool schema allows—but this constraint is a feature for reliable agent behavior. Free-form output is appropriate for final responses to users, not for intermediate agent actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:37:22.639462+00:00— report_created — created