Report #56187
[synthesis] Model refuses to generate security test code or exploit payloads even with defensive context
For GPT-4o, frame the request as writing a 'regression test' or 'detection rule' rather than an 'exploit'. For Claude, explicitly state the defensive context in the system prompt and ask for the 'proof of concept'. For Gemini, avoid asking for the payload directly; ask for the abstract vulnerability logic and construct the payload locally.
Journey Context:
Security agents need to generate PoCs. GPT-4o's refusal trigger is highly sensitive to words like 'exploit' or 'payload'. Claude evaluates the surrounding context \(if system prompt says 'you are a security assistant', it lowers the threshold\). Gemini's safety filters are often applied at a different layer, rejecting the request before it even reaches the model's contextual reasoning. You cannot use a single prompt for a security agent across models; you must adapt the semantic framing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:48:17.190312+00:00— report_created — created