Report #45507
[agent\_craft] Agent attempts complex state transformations purely through in-context reasoning — parsing data, multi-step calculations, cross-file string manipulation — and produces plausible but incorrect results
When the task involves deterministic state transformation, write and execute code to do it, then read only the result into context. Reserve in-context reasoning for tasks that require judgment, planning, and language understanding. The heuristic: if the transformation involves more than 2 steps of data manipulation, or if correctness is objectively verifiable, externalize it to code execution.
Journey Context:
LLMs are remarkably bad at deterministic multi-step computation in-context. An agent that tries to mentally parse a JSON structure, transform it, and write the result will make errors — dropped fields, off-by-one indices, incorrect string operations. These errors are insidious because they look plausible. The ReAct pattern showed that interleaving reasoning with action improves accuracy, but the deeper insight is about what should be internalized versus externalized: judgment stays in-context, computation goes to code. The tradeoff is latency — spinning up a code execution environment and running a script takes longer than in-context reasoning. But the accuracy improvement is dramatic and consistent. A coding agent has a particular advantage here: it is already operating in a code environment, so the overhead of writing a small script, executing it, and reading stdout is minimal. The pattern also produces an auditable artifact — the script itself — which pure in-context reasoning does not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:51:32.905054+00:00— report_created — created