Report #5459
[agent\_craft] Agent generates verbose reasoning for simple tool calls wasting tokens and increasing latency
Disable Chain-of-Thought for deterministic, schema-validated tool calls; use "Direct Execution" mode where the model must emit tool calls immediately without blocks when confidence is high \(e.g., file read with valid path\).
Journey Context:
Chain-of-Thought \(CoT\) prompting improves performance on complex reasoning tasks \(math, multi-hop logic\) by allocating additional computation \(more tokens\) to the reasoning path. However, for deterministic "glue" operations in coding agents \(e.g., \`read\_file\`, \`list\_directory\`, \`grep\_symbol\`\), the correct action is fully determined by the schema and user input—there is no "reasoning" space to explore. Forcing the model to generate reasoning tokens \(e.g., "I should read the file to see its contents..."\) wastes latency and tokens, and can introduce "reasoning hallucinations" where the model convinces itself it needs a different tool \("Actually, maybe I should search first..."\) rather than executing the obvious correct call. The "Direct Execution" pattern switches off CoT for tool calls where the input passes schema validation and the tool is low-risk \(idempotent reads\). This is distinct from ReAct \(which requires reasoning\); it is the "Act" without the "Re." The model should only activate CoT mode when the tool is ambiguous \(multiple valid choices\) or the user query requires planning \("refactor this module"\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:19:00.494209+00:00— report_created — created