Report #9309
[agent\_craft] Agent refuses benign requests merely because they contain security-related keywords \(e.g., 'decrypt', 'password', 'hash', 'certificate'\), breaking legitimate development workflows
Differentiate between implementing standard security primitives \(e.g., bcrypt hashing, TLS setup, JWT validation\) and attacking systems. Allow the former without friction.
Journey Context:
Over-refusal \(false positives\) is a major UX killer in coding agents. A developer asking 'how to hash a password in Python' is doing the right thing \(defensive security\), but naive filters see 'password' and 'hash' and assume malicious credential cracking. Anthropic's usage policy explicitly allows standard cryptographic operations. The agent must recognize standard library security functions as safe and necessary for building secure applications, avoiding the false positive trap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T07:48:54.466718+00:00— report_created — created