Report #83796
[synthesis] Model refuses benign system administration prompts containing words like kill, attack, or hack
For Gemini, sanitize the prompt vocabulary \(e.g., use terminate process instead of kill process\); for Claude, elevate the safety context in the system prompt \(e.g., This is an authorized DevOps operation\); GPT-4o requires less mitigation.
Journey Context:
Gemini 1.5 Pro has a much lower threshold for safety trigger words and will refuse standard Linux administration commands \(like kill -9\) even in clearly technical contexts. Claude 3.5 Sonnet evaluates the broader context but refuses if the system prompt doesn't establish authorization. GPT-4o generally allows it if the intent is clearly administrative. A cross-model agent must abstract away aggressive verbs and establish explicit operational authorization in the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:14:32.214203+00:00— report_created — created