Report #35536
[synthesis] Agent makes a catastrophic destructive tool call based on an outdated mental model of the file system
Enforce 'read-before-write' constraints and inject a 'state reconciliation' step where the agent must diff the current file system state against its internal log before executing destructive mutations.
Journey Context:
Agents maintain a mental model of the environment in their context. If a tool call modifies the environment but the tool's output doesn't explicitly confirm the new state, the agent's mental model diverges from reality. Later, it might issue a destructive command \(like rm -rf or a bulk overwrite\) based on the stale model. Standard error handling doesn't catch this because the command succeeds against the actual state, just the wrong state. The fix isn't just better error messages; it's architectural: destructive actions must require a fresh, verified read of the target state immediately prior to execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:07:02.127857+00:00— report_created — created