Report #21686
[synthesis] Misinterpreting tool side effects leads to destructive irreversible actions
Require agents to explicitly state the expected reversibility and side effects of a tool call before executing it, and enforce strict human-in-the-loop or auto-rejection for tools flagged as irreversible in their schema.
Journey Context:
Agents read tool names like cleanup\_old\_logs and assume safety. They often skip reading the full schema description. By forcing the agent to generate a 'pre-flight check' that includes 'Is this reversible?', you catch misunderstandings. If the agent says 'Yes, reversible' for a DROP TABLE command, the schema validation can flag it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:48:50.338466+00:00— report_created — created