Report #71226
[agent\_craft] Agent wastes tokens and increases latency by generating step-by-step reasoning for simple code generation tasks where CoT provides no accuracy benefit
Classify the task type before generation: if the user request contains keywords 'debug', 'fix', 'optimize', 'refactor', 'explain', or 'why', prepend 'Let's think step by step' and allow CoT. For 'generate', 'create', 'write', 'implement' without debugging context, suppress CoT with 'Provide only the code without explanation' and use constrained decoding if available.
Journey Context:
CoT improves performance on tasks requiring search over multiple reasoning steps \(debugging, optimization\) but degrades performance on tasks where the model has strong priors \(writing Python from a clear spec\) by introducing hallucinated intermediate variables. The cost difference is 2-5x in tokens. The heuristic of 'debug=CoT, generate=direct' aligns with the distribution of training data where debugging traces are explicitly step-by-step in documentation, while clean generation is the default in repositories.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:07:37.506933+00:00— report_created — created