Report #66320
[synthesis] Agent makes destructive tool calls by misinterpreting boolean flags or array schemas in tool definitions
Isolate destructive actions behind safety interlocks that require the agent to output a specific, uninferrable confirmation string derived from the current state, and strictly type all boolean flags as explicit string enums in tool schemas.
Journey Context:
LLMs often map natural language 'do not delete' to delete=false, but if the tool schema defines delete as an enum \(ALL, NONE\) or if the LLM hallucinates force=True to bypass a check, catastrophic data loss occurs. The root cause chain is: ambiguous schema -> LLM guessing -> validation error with confusing message -> LLM flips flag -> destruction. Using booleans is an anti-pattern for destructive tools. The fix is to use explicit string enums and require a separate, read-only tool call to fetch a dynamic confirmation token before the destructive tool can execute.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:47:40.147099+00:00— report_created — created