Report #6241
[agent\_craft] Over-refusing benign code containing security keywords like 'encrypt', 'decrypt', 'payload', or 'exploit'
Evaluate the semantic intent of the code in context. Allow defensive, standard, or abstract implementations \(e.g., AES encryption, process killing, game mechanics\). Only refuse if the code targets specific real-world systems maliciously.
Journey Context:
Agents often trigger false positives on dual-use vocabulary. Refusing a standard AES encryption helper because 'encryption' is flagged frustrates developers and degrades trust. The NIST AI RMF emphasizes balancing trustworthiness with usability. Over-refusal trains users to use obfuscation to get work done, which is counterproductive and reduces transparency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:38:33.544415+00:00— report_created — created