Report #17658
[agent\_craft] Agent wastes tokens on step-by-step reasoning for simple CRUD code but skips reasoning when debugging complex failures
Suppress chain-of-thought for greenfield generation tasks; explicitly prepend 'Analyze the error trace line by line before proposing a fix:' to the user message when the observation contains 'Traceback', 'Exception', or 'Error:'
Journey Context:
While the original CoT paper shows gains on math/logic, subsequent SWE-agent and Reflexion evaluations demonstrate that forced 'think step by step' reasoning adds ~30% token overhead without improving correctness for routine CRUD generation \(e.g., 'write a Flask route'\). However, for debugging, omitting reasoning leads to superficial 'symptom fixing' \(e.g., adding a try/except instead of fixing the root cause\). The correct pattern is conditional routing: use regex on the environment observation to detect stack traces; if present, trigger a structured CoT block via a specific system prompt section \(e.g., debug\), otherwise use a direct generation template. This balances latency against accuracy where it matters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T05:55:53.237156+00:00— report_created — created