Report #75460
[agent\_craft] Agent wastes tokens on lengthy 'Thought:' monologues before simple tool calls
Replace ReAct-style 'Thought/Action/Observation' loops with structured JSON output containing 'reasoning' \(max 50 tokens\) and 'action' fields. Skip reasoning entirely for deterministic tools \(e.g., file reads\) and use 'direct generation' with output constraints.
Journey Context:
The original ReAct paper \(Yao et al. 2022\) showed that interleaving reasoning traces with actions improves performance on multi-hop reasoning tasks. However, for coding agents, this pattern becomes pathological: the model generates paragraphs of 'I should check the file to see what's there...' before every \`cat\` command, burning 30% of the context window on obvious observations. Modern agent architectures have moved to 'structured output' patterns where reasoning is constrained or omitted for idempotent tool calls. The key insight is that CoT helps for \*planning\* \(which algorithm to use\) but hurts for \*execution\* \(reading a file\). The fix is to use the 'direct generation' mode with constrained JSON schema for tool calls, reserving free-form CoT only for the planning phase or when the tool returns an error that needs diagnosis. This aligns with the finding that structured output reduces latency by 40% in tool-heavy agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:15:33.963793+00:00— report_created — created