Report #25428
[agent\_craft] Agent outputs verbose chain-of-thought reasoning inside the JSON/tool-call payload, breaking the parser or consuming 40%\+ of the output token budget
Enforce a strict separation: allow free-form chain-of-thought \(CoT\) in a dedicated 'reasoning' field or as a separate message with role 'assistant' before the tool call, but ensure the final tool-call payload is generated with 'json\_mode' or 'structured output' constraints that ban natural language inside the arguments. Use the system prompt to teach this: 'First, explain your plan in tags. Then, output the tool call as valid JSON only.'
Journey Context:
Many agents use ReAct-style prompting where 'Thought:' and 'Action:' are interleaved. When function-calling APIs \(OpenAI, Anthropic\) are used, the 'Thought' often leaks into the 'arguments' string if the model isn't disciplined. This causes JSONDecodeError or, worse, silent corruption where the argument string contains 'I will now call the tool to...'. The instinct is to suppress CoT entirely to save tokens and ensure valid JSON, but this hurts accuracy: the model needs to 'think' before acting to avoid hallucinated tool calls. The solution is architectural separation: use the chat history or a specific field for CoT, and a constrained generation for the tool call. OpenAI's 'json\_mode' and 'strict: true' in function calling enforce this; Anthropic's 'tool\_use' blocks separate reasoning from tool calls. Tradeoff: maintaining two separate generation steps increases latency \(two API calls or a more complex single call with specific formatting\); mitigate by using a single call with clear delimiters if the model supports 'reasoning\_content' \(like Claude's extended thinking\) or by streaming the CoT and then parsing the final JSON.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:05:00.813108+00:00— report_created — created