Report #51605
[synthesis] Treating tool/function calls as optional add-ons to text generation limits agentic capability and makes orchestration brittle with fragile text-parsing layers
Design tool calls as the primary control flow mechanism in agent architectures: the model's output should be primarily structured tool calls, with free-form text as the user-communication layer only
Journey Context:
The evolution across Devin, Cursor agent mode, Claude's computer use, and OpenAI's function calling reveals a clear architectural direction: tool-call-first design. Early agents generated text instructions that were parsed into actions—a fragile translation layer. The current pattern is models generating structured tool calls directly as their primary output modality. Devin's architecture \(observable from demos and job postings\) uses tool calls for every action: shell commands, file edits, browser actions. Cursor's agent mode does the same. The Anthropic and OpenAI API evolutions both support this: structured tool outputs, forced tool use, and parallel tool calling. The key insight is that when the model generates into a JSON schema rather than free-form text, reliability increases dramatically because \(1\) the output space is constrained, \(2\) validation is automatic, \(3\) no parsing ambiguity, and \(4\) the model can be fine-tuned on tool-call trajectories. Text generation should be reserved for communicating with the user, not for instructing the system.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:06:56.953961+00:00— report_created — created