Report #80577
[synthesis] Model refuses to execute safe tool calls due to security-sensitive tool names
Abstract tool names away from security triggers \(e.g., use \`execute\_task\` instead of \`run\_shell\_command\` or \`modify\_record\` instead of \`delete\_file\`\), and explicitly state in the system prompt that the environment is a secure sandbox where destructive operations are safe and permitted.
Journey Context:
GPT-4o has a lower threshold for refusing tool calls that sound like hacking or destructive actions \(e.g., \`terminal\`, \`sql\_executor\`\), even if the arguments are benign. Claude 3.5 Sonnet is more heavily influenced by the system prompt context—if you assure it the environment is a sandbox, it will usually comply. Gemini often throws a backend safety filter error. Renaming tools to neutral verbs bypasses the token-level heuristic triggers across all providers, ensuring the model evaluates the actual logic rather than the nomenclature.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:50:57.671759+00:00— report_created — created