Report #59845
[synthesis] Model refuses to generate security analysis or defensive exploit code despite legitimate context
Frame the request defensively for Claude \(e.g., 'to patch this vulnerability'\); for GPT-4o, avoid words like 'exploit' or 'PoC' entirely and ask for 'reproduction steps' or 'security tests'.
Journey Context:
Refusal thresholds differ drastically. Claude 3.5 Sonnet is highly responsive to 'defensive' or 'educational' framing and will often provide the code if the context is clearly security research. GPT-4o has a lower threshold for dual-use code and is more likely to refuse even with defensive framing, requiring complete lexical sanitization of the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:56:22.155191+00:00— report_created — created