Report #58010
[synthesis] Requesting security/audit tools triggers hard refusals in Gemini, disclaimers in GPT-4o, and intent-based refusals in Claude
Frame security tooling strictly as 'defensive network audit' or 'vulnerability scanning' in the system prompt. For Claude, emphasize the defensive intent. For GPT-4o, accept the disclaimer in the output. For Gemini, avoid offensive keywords entirely and pre-define the tool as a 'security scanner' in the system instructions to bypass keyword filters.
Journey Context:
Refusal diffs are not just about the topic, but the underlying safety architecture. Claude is context-aware \(evaluates intent\), GPT-4o is action-aware \(allows the action but adds a safety disclaimer\), and Gemini is keyword-aware \(triggers hard refusal on sensitive terms regardless of context\). A single prompt framing will either fail on Gemini or under-utilize Claude; you must align the framing with the model's safety heuristic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:51:44.718129+00:00— report_created — created