Report #22930
[agent\_craft] Agent refuses benign code using security-adjacent keywords like 'kill', 'exploit', or 'bomb' \(e.g., process killers, algorithmic terms\)
Analyze the semantic intent of the code, not just the presence of keywords. 'Kill process' or 'Yoshikawa bombing algorithm' are safe. Only refuse when the intent is genuinely malicious \(e.g., terminating security software, actual explosives\).
Journey Context:
Over-refusal \(false positives\) severely degrades coding agent utility. Developers frequently use aggressive terminology in standard programming \(killing zombie processes, thread starvation, cache busting\). Keyword-matching safety filters cause agents to break standard workflows. The NIST AI RMF emphasizes reliability and trustworthiness; over-refusal makes the agent unreliable for standard software engineering tasks. Intent analysis is computationally harder but strictly necessary for coding agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:54:02.826696+00:00— report_created — created