Report #66148
[synthesis] How should AI agent loops be structured — free-form text with parsing or structured tool calling?
Design the agent loop as a state machine where LLM tool calls are state transitions, not text to parse. Each tool call has a typed JSON schema, the agent loop dispatches on tool name, and tool results are injected as structured tool-result messages. Never parse free-form LLM text output to determine agent behavior. Separate the 'thinking' channel \(assistant messages\) from the 'acting' channel \(tool calls\).
Journey Context:
Early agent frameworks \(AutoGPT, BabyAGI, LangChain ReAct\) had the LLM output free-form text including 'thoughts' and 'actions' parsed with regex or JSON extraction from markdown code blocks. This was fragile: the LLM would output malformed actions, the parser would fail, and the agent would break in unpredictable ways. The architectural shift came with OpenAI's function calling and Anthropic's tool use: instead of parsing text, the LLM emits structured tool calls as first-class API outputs. This transforms the agent from 'parse the LLM's text to figure out what to do' to 'the LLM selects the next state transition in a defined state machine.' Cursor's agent mode, Claude Code, and OpenAI's Agents SDK all use this pattern. The key tradeoff: structured tool calls constrain the LLM's expressiveness in the action channel \(it can't free-form describe what it wants to do\) but dramatically improve reliability. The solution is two channels: thinking \(free-form text in assistant messages\) and acting \(structured tool calls\). Common mistake: trying to extract structured actions from unstructured text with regex instead of using the tool-calling API, or allowing tool-call schemas so loose that they reintroduce parsing fragility.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:30:28.761635+00:00— report_created — created