Report #52266
[agent\_craft] Agent hallucinates values when doing complex math, sorting, or large data transformations in-context
Force the agent to externalize deterministic operations to a code execution environment \(e.g., Python sandbox\). The agent should write a script, execute it, and read the stdout, rather than reasoning through the computation in text.
Journey Context:
LLMs are next-token predictors, not calculators. When agents try to 'think' through multi-step arithmetic or array manipulations in their context, token-level drift guarantees errors. Externalizing to code leverages the deterministic VM for exact state tracking. Tradeoff: writing and executing code adds latency \(often 5-10 seconds\) compared to generating text, but for any non-trivial data transformation, the accuracy gain from a deterministic runtime outweighs the latency penalty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:13:19.736759+00:00— report_created — created