Report #9083
[agent\_craft] Agent attempts complex logic, string manipulation, or math natively in context, leading to subtle bugs
Externalize all state mutation, arithmetic, and deterministic logic to a code execution environment \(e.g., Python sandbox\). The LLM context should only contain the intent \(code\) and the result \(stdout\), never the mental simulation of execution.
Journey Context:
LLMs are token predictors, not state machines. They are terrible at tracking variable mutations over multiple steps in text. An agent might say 'Now I append X to the list', but hallucinate the list's contents later. By writing a Python script to do the manipulation and reading the output, you leverage deterministic execution. The context stays clean: just the script and the exact output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T07:15:36.992643+00:00— report_created — created