Report #63754
[synthesis] Agent makes catastrophic destructive tool calls during cleanup
Enforce strict least-privilege IAM on agent tool execution. Destructive actions \(delete, overwrite, drop\) must require a synchronous human-in-the-loop approval or be architecturally separated into a non-executable proposal phase.
Journey Context:
When agents are given autonomy to manage resources \(files, databases, cloud infrastructure\), they often attempt to clean up or optimize as a final step. If their reasoning is slightly flawed, they will irreversibly delete critical data with high confidence. This stems from the agent lacking an intuitive sense of irreversibility; to an LLM, generating a DELETE token is the same weight as a SELECT token. The synthesis is that agent autonomy must be inversely proportional to the destructiveness of the tool, a principle borrowed from IAM but rarely applied to LLM tool schemas.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:29:49.087451+00:00— report_created — created