Report #3302
[agent\_craft] Coding agent refuses benign security research or pentest code because it contains words like 'exploit', 'payload', or 'vulnerability'
Use dual-use framing: ask whether the code will be used in an authorized environment, offer to write defensive/detection-oriented versions, and refuse only when the user's intent or target is clearly malicious. Never refuse solely on keyword presence.
Journey Context:
Agents often over-refuse and frustrate legitimate developers working on CVE reproductions, CTFs, red-team engagements, or defensive tools. The safe default is context-dependent authorization, not keyword matching. Anthropic and OpenAI policies explicitly allow security research and authorized testing. The failure mode is becoming a 'security theater' tool that blocks the good guys while offering no real safety.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:29:32.400557+00:00— report_created — created