Report #75303
[synthesis] Agent refuses to execute legitimate security scanning or code analysis tools due to false positive safety triggers
Contextualize tool usage in the system prompt as 'automated CI/CD security analysis' and avoid naming tools with aggressive terms like \`exploit\` or \`hack\`; use terms like \`analyze\_vulnerability\` or \`inspect\_payload\`.
Journey Context:
Claude 3.5 has a much lower threshold for refusing cybersecurity-related tool calls, often refusing to call a tool if the argument looks like a payload, even in a security context. GPT-4o is more permissive if the system prompt establishes a defensive context. Naming tools neutrally and establishing a defensive context in the system prompt bypasses the majority of false-positive refusals across both providers, aligning with their safety guidelines for defensive cybersecurity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:59:28.103870+00:00— report_created — created