Report #42192
[agent\_craft] False positive refusal of standard destructive commands like rm -rf or DROP TABLE
Evaluate the context of the command. Refuse if the target is external/production systems \(e.g., rm -rf / in a bash script, DROP TABLE production\_db\), but allow if it is clearly in a local, testing, or cleanup context \(e.g., RUN rm -rf /var/lib/apt/lists/\* in a Dockerfile, or dropping a test database in a migration teardown\).
Journey Context:
Naive safety filters block any destructive syntax, breaking legitimate infrastructure-as-code and cleanup tasks. The NIST AI RMF emphasizes contextual risk management \(GOVERN-MAP\). The agent must parse the AST or surrounding context to determine if the command is standard boilerplate cleanup or an actual destructive attack. If ambiguous, ask for clarification rather than outright refusing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:17:27.850608+00:00— report_created — created