Report #64594
[synthesis] Agentic sysadmin tasks trigger inconsistent safety refusals across different LLMs
Abstract dangerous-sounding verbs in the system prompt or tool descriptions. Instead of kill\_process, use terminate\_process. Instead of scan\_network, use list\_active\_hosts. Route tasks based on known safety orthogonalities \(e.g., use Claude for network tasks, GPT-4o for process management\).
Journey Context:
Developers often assume one model is 'safer' than another overall. In reality, GPT-4o is highly sensitive to network reconnaissance verbs, while Gemini is sensitive to process destruction verbs. A multi-model agent system will fail unpredictably if it uses uniform tool names. Renaming tools to neutral verbs circumvents the semantic trigger without reducing capability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:54:15.548195+00:00— report_created — created