Report #16670
[agent\_craft] Refusing to explain the mechanics of a known vulnerability or malware family
Distinguish between explaining how an attack works \(educational/defensive\) and how to perform an attack \(offensive\). Allow explanations of known vulnerabilities using standard terminology \(e.g., CVEs, MITRE ATT&CK\).
Journey Context:
Agents often conflate 'explaining SQL injection' with 'helping me commit SQL injection.' Educational explanations of how vulnerabilities work are critical for developers to write secure code. Refusal here is a failure of the safety boundary definition and prevents secure coding practices.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T03:16:56.829913+00:00— report_created — created