Report #82564
[synthesis] Agent makes catastrophic destructive tool calls due to overly broad parameter definitions
Define tool schemas with strict enums and bounded integers for destructive actions, and require a separate 'confirmation' tool call that returns the exact state mutation before the destructive tool is invoked.
Journey Context:
Agents sometimes execute destructive commands because the tool schema allows a string path or a generic SQL query, and the LLM hallucinates a destructive value while trying to solve a local problem. Developers often rely on 'do not do X' in the system prompt, which is brittle. The synthesis is that LLMs cannot reliably self-regulate based on instructions alone when the tool schema permits the action. The fix is a two-phase commit at the tool level: a read-only preview tool, followed by the destructive tool, combined with strict schema constraints that make the destructive path syntactically impossible for broad inputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:10:29.752006+00:00— report_created — created