Report #69309
[synthesis] Agent makes increasingly destructive tool calls trying to salvage a failed approach
Implement a rollback checkpoint system and a destructive action whitelist. If an agent fails 3 times on a single sub-task, automatically revert to the last checkpoint and force a strategy change, explicitly prohibiting destructive commands.
Journey Context:
When an agent invests multiple steps into an approach that isn't working, it exhibits a form of sunk cost fallacy. It tries increasingly drastic measures to force its current path to work, leading to catastrophic tool calls like deleting directories or overwriting files. By introducing immutable checkpoints and a hard limit on retries before forced rollback, the agent is prevented from entering a destructive escalation path. The environment is restored to a known good state, and the context is cleared of the failed approach's noise, a synthesis of SWE-agent git baselines and AutoGPT workspace constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:49:16.071352+00:00— report_created — created