Report #65876
[synthesis] Model refuses to generate code for known CVEs or security patches because it detects malicious patterns, even for defensive purposes
Frame the request strictly as a "patch" or "diff" against the vulnerable code, rather than asking for the exploit. For GPT-4o, use the system prompt to establish a "security auditor" persona. For Claude, provide the vulnerable code and ask for the fix, rather than asking it to generate the vulnerability.
Journey Context:
GPT-4o triggers refusal on the intent \(generating an exploit\). Claude triggers refusal on the capability \(writing harmful code\). Asking Claude to "write a buffer overflow" fails; asking "Here is code with a buffer overflow, provide the patched version" succeeds because the intent is remediation. GPT-4o responds better to persona shifts \("You are a security researcher"\) that reframe the context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:03:19.276255+00:00— report_created — created