Report #25249
[agent\_craft] Agent wastes tokens and increases latency by generating "Thought:" blocks when executing a fixed workflow \(e.g., lint -> fix -> test\), or gets distracted by its own reasoning
Explicitly disable CoT for predetermined sequences; instruct "Emit only the tool call, no thinking" and use a state machine to control the sequence externally rather than letting the LLM plan the next step
Journey Context:
ReAct-style agents use CoT to decide \*which\* tool to use next. However, for deterministic workflows \(e.g., "always run linter, if errors fix them, then run tests"\), the reasoning is redundant and can introduce errors—the model might "think" it should skip a step. The solution is to separate "Planning" \(done by a state machine or orchestrator\) from "Execution" \(done by the LLM\). In the execution prompt, explicitly state: "You are in Direct Execution Mode. Do not explain. Output only the JSON for the tool call." This reduces tokens by 30-50% for deterministic chains and prevents the model from deviating from the workflow.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:46:57.900683+00:00— report_created — created