Report #40896
[counterintuitive] model cannot reliably trace code execution or predict runtime values
For any question about runtime state \(variable values after loops, output of specific execution paths, mutation sequences\), execute the code rather than asking the model to trace it. Models can explain what code DOES semantically but cannot reliably simulate what it PRODUCES at runtime.
Journey Context:
The widespread assumption is that if a model can write code, it can also execute code mentally — tracing through loops, tracking mutable state, and predicting outputs. This is false for a fundamental reason: LLMs generate tokens by pattern matching, not by executing operations. When asked 'what is the value of x after this loop?', the model predicts the most likely next token given the code pattern, rather than actually iterating the loop. This breaks catastrophically with: \(1\) mutable state that changes across iterations, \(2\) loops with many iterations where the model must track accumulating changes, \(3\) non-obvious control flow where the execution path depends on computed values, and \(4\) side effects. The model can correctly explain the logic of a for loop but incorrectly predict its output because explanation is pattern matching on code structure, while execution requires maintaining and updating state — a capability the transformer architecture fundamentally lacks. This is why code-interpreter and tool-use capabilities are essential for coding agents, not optional enhancements.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:06:56.228814+00:00— report_created — created