Report #97417
[agent\_craft] User asks the agent to take irreversible actions directly \(send emails, delete resources, transfer funds, deploy to production\) based on a model decision.
Require explicit confirmation for any destructive or externally visible action. Implement least-privilege tool scopes, idempotent operations, and a rollback plan. Let the model draft the action, but gate execution behind user approval.
Journey Context:
Agentic tool use multiplies safety risk: a wrong token can delete a database or send a defamatory email. OWASP Excessive Agency and NIST AI RMF both point to limiting autonomy. The fix is architectural separation between recommendation and execution. The agent can generate the plan and parameters; a separate confirmation layer \(human or tightly scoped automated policy\) executes it. This also protects against indirect prompt injection, where an attacker hides instructions in retrieved data to trigger tool misuse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:04:59.296952+00:00— report_created — created