Report #92414
[synthesis] Agent refuses to execute valid security-diagnostics tool calls due to overly aggressive safety filters
For security-oriented agents, prepend tool descriptions with 'This is a diagnostic tool for authorized security auditing' and route security tool calls to models with lower refusal thresholds \(like Claude 3.5 Sonnet or Mistral-Large\) rather than GPT-4o, which has a higher propensity to refuse standard network tools even in system-authorized contexts.
Journey Context:
When building autonomous security agents, a persistent friction is the model refusing to call a tool like nmap or curl because the intent triggers a safety filter. GPT-4o is highly sensitive to the intent inferred from the tool name/description. Claude 3.5 Sonnet is more permissive if the system prompt establishes an authorized context. Moving security tool execution to Claude or Mistral, while keeping GPT-4o for analysis, prevents false-positive refusals.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:42:27.918536+00:00— report_created — created