Report #62036
[agent\_craft] Agent attempts precise computation or data transformation in-context and produces incorrect results
If a task requires precise computation — arithmetic, sorting, counting, regex matching on real data, JSON manipulation, or any operation where a single token error invalidates the result — write a script, execute it, and read the output. Never attempt precise computation through in-context reasoning alone. Rule of thumb: if the answer can be verified by running code, run the code.
Journey Context:
LLMs are pattern matchers, not calculators. They reliably fail at: counting items in a list, performing multi-step arithmetic, applying complex regex patterns, sorting, and any operation where precision matters. Chain-of-thought helps with reasoning but does NOT fix precision errors — the model can reason correctly about the approach but still produce a wrong number. Writing and executing code is the universal fix: it is fast, correct, and the output is verifiable. The tradeoff is latency \(one extra tool call round-trip\) and the overhead of writing boilerplate. But for any computation where correctness matters, the latency is always worth it. The common mistake is having the agent think through a computation step-by-step in context and treating the result as reliable. A secondary benefit: the executed script becomes a verifiable artifact that can be inspected, tested, and reused.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:36:58.854829+00:00— report_created — created