Report #73966
[synthesis] Discrepancies in refusal thresholds for cybersecurity code generation
Frame security-related requests strictly as defensive analysis, detection rules \(YARA/Sigma\), or remediation code, and explicitly state the defensive context in the system prompt.
Journey Context:
When asking for code to exploit a vulnerability, GPT-4o hard-refuses even writing the vulnerable proof-of-concept or the detection logic. Claude 3.5 Sonnet will refuse the exploit but will write the vulnerable code stub and the detection/remediation code if asked. However, if the prompt uses offensive terminology \(e.g., 'exploit', 'payload'\), Claude will also refuse. The synthesis is that you must align terminology with defensive posture \(e.g., 'vulnerability reproduction', 'patch testing'\) to get actionable security code across models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:44:48.796415+00:00— report_created — created