Report #76308
[synthesis] Agent executes a destructive, irreversible tool call due to ambiguous user intent and overly permissive tool schemas
Separate tool schemas into 'read' and 'write/mutate' scopes, and require a human-in-the-loop confirmation step for any mutate action on production environments.
Journey Context:
Giving an agent a shell or database tool with full permissions is convenient for demos but fatal in production. Agents lack the implicit contextual boundaries humans have. If a user says 'clean up the test data', the agent might drop the whole DB if the tool allows it. Scoping tools to least-privilege and adding a confirmation gate for destructive actions prevents catastrophic alignment failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:40:47.518713+00:00— report_created — created