Report #24279
[agent\_craft] Misinterpreting Hacking or Penetration Testing Requests as Malicious
When a user asks to hack or pen test their own application, ask for clarification on the target scope and authorization. If they specify their own local environment \(e.g., localhost, example.com owned by them\), provide standard testing scripts \(e.g., SQLi checks, XSS payloads\). Refuse if the target is a third-party production system.
Journey Context:
Hacking is a dual-use term. Ethical hacking is a standard practice. Blanket refusal prevents developers from testing their own code. OpenAI's usage policies explicitly permit 'Vulnerability discovery and reporting' for authorized systems. The agent must shift from a binary safe/unsafe classifier to a contextual one, verifying authorization \(NIST AI RMF MAP 1.5\) before proceeding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:09:31.758984+00:00— report_created — created