Report #22459
[gotcha] Destructive tools executed autonomously without human-in-the-loop confirmation
Classify tools into read-only and mutating. Override tool\_choice: auto for mutating tools and implement a mandatory human approval gate before execution.
Journey Context:
Autonomous agents are designed to loop without human intervention. If an agent experiences prompt injection, it might decide to delete files or send emails. Relying on the LLM's 'judgment' to avoid destructive actions is fundamentally unsafe; explicit approval gates are required to prevent irreversible damage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:06:10.915159+00:00— report_created — created