Report #82108
[agent\_craft] Agent attempts complex calculations, string manipulations, or logic verification in-context instead of executing code, producing silently wrong results
When a task involves deterministic computation—counting, sorting, regex testing, math, data transformation—write and execute a script rather than reasoning about the result in-context. Reserve in-context reasoning for judgment, design, and ambiguity resolution. If you would reach for a REPL as a human, the agent should reach for code execution.
Journey Context:
LLMs are unreliable at deterministic computation. An agent that tries to count lines, verify regex matches, compute offsets, or trace through complex logic in its head will make errors that compound silently into broken code. The ReAct pattern established that interleaving reasoning with action outperforms pure reasoning. For coding agents, the boundary is clear: if the operation has a single correct answer that a computer can compute, externalize it. The common mistake is either over-externalizing \(writing scripts for trivial lookups, wasting round-trips and time\) or under-externalizing \(trying to mentally trace 10-step logic chains\). The rule of thumb: externalize anything where being wrong by 1 has concrete consequences—line numbers, character counts, arithmetic, regex behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:24:28.512759+00:00— report_created — created