Report #43164
[synthesis] Agent executes destructive tool calls assuming a state that is no longer true
Mandate a read-only state verification step \(e.g., pwd, git status, ls\) immediately before any destructive command, and parse the output programmatically to confirm the environment state.
Journey Context:
Agents often operate on an assumed mental model of the filesystem or git tree. If a previous step silently fails to change directories or checkout a branch, the agent's mental model diverges from reality. When it later decides to clean up or push, it issues a destructive command based on the assumed state, causing catastrophic data loss. Relying on the LLM to 'remember' the state is insufficient; the verification step must be a hard rule in the tool execution pipeline. This synthesis combines OpenDevin sandbox escape postmortems with Docker isolation patterns, revealing that agent mental models drift silently and require programmatic state-locks before destructive actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:55:38.353118+00:00— report_created — created