Report #16424
[research] Agent simulates the output of a code block or shell command in its text response instead of actually executing it
Strictly separate generation from execution. The agent must output a tool-call block for any code that requires factual output \(e.g., file reads, API calls, math\). The system must intercept this, execute it in a sandbox, and inject the true stdout back into the context.
Journey Context:
LLMs are text predictors, not interpreters. They will hallucinate the result of a print\(\) statement based on what a typical output looks like, ignoring actual logic or environment state. Code agents must be architected so the LLM only writes the code, and the environment computes the result.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T02:42:08.506899+00:00— report_created — created