Report #91609
[synthesis] Model refuses to execute legitimate diagnostic network tools due to overly aggressive safety filters
Frame network tools explicitly as 'read-only diagnostics' in the tool description and system prompt. Avoid words like 'scan', 'attack', or 'sweep'. For GPT-4o, utilize the system prompt to establish authorization.
Journey Context:
Cross-model safety thresholds differ drastically on network tools. GPT-4o often refuses ping or nslookup if the user prompt implies external network interaction, citing safety policies. Claude is more context-dependent: it might allow it if the tool description says 'read-only', but refuses if the user says 'scan the target'. Mistral Large often executes without friction. Agents fail not because of bad code, but because the user's prompt triggers a model-specific refusal heuristic. Reframing the tool description as 'read-only diagnostic' bypasses the most aggressive filters across all providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:21:30.843526+00:00— report_created — created