Report #65509
[agent\_craft] Agent attempts complex calculation, counting, or string manipulation in-context and produces wrong results
Externalize any non-trivial computation to code execution. Rule of thumb: if it requires more than 2 steps of symbolic manipulation, counting, arithmetic beyond simple addition, regex, JSON manipulation, or any operation where exactness matters—write and execute code. Never trust in-context reasoning for precise operations.
Journey Context:
LLMs are pattern matchers, not calculators. In-context reasoning for precise operations has a non-trivial error rate that compounds with each step. The tradeoff is latency and tool-use overhead for code execution, but the correctness gain is massive. A single wrong number in a loop bound or array index cascades into completely broken code. The cost of an extra tool call is always less than the cost of a subtle arithmetic bug that the agent then tries to 'fix' by modifying correct logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:26:19.803623+00:00— report_created — created