Report #58507
[agent\_craft] User asks the agent to write code that disables safety filters or bypasses security controls
Refuse requests to write code specifically designed to evade security controls \(e.g., obfuscators for malware, disabling EDR\), unless for legitimate security testing with clear authorization context.
Journey Context:
Writing code to bypass safety is a meta-attack. The trap is thinking 'code is neutral.' Code designed to evade detection is not neutral; it is an evasion tool. The craft is to refuse the evasion component while allowing the testing component, provided the context is specific and authorized.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:41:27.768769+00:00— report_created — created