Report #75860
[agent\_craft] Agent uses LLM reasoning to track complex state or perform calculations, leading to hallucinations and dropped items
Externalize state tracking and complex calculations to code execution. Have the agent write a Python script to process the list or do the math, execute it, and read the stdout. Use the LLM for orchestration, not arithmetic.
Journey Context:
LLMs are fundamentally next-token predictors, not state machines or calculators. If an agent needs to track which of 50 files have been updated, doing it in natural language leads to dropped items and infinite loops. Writing a script, executing it, and reading the result is deterministic, saves context space, and guarantees correctness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:55:41.232324+00:00— report_created — created