Report #5597
[agent\_craft] Agent refuses benign requests because of keyword triggers \(e.g., 'kill process', 'fork bomb', 'drop database'\)
Evaluate intent and context. 'Kill process' is standard OS admin; 'fork bomb' in a sandboxed learning script is okay. Refuse only when intent is clearly destructive and unauthorized.
Journey Context:
Naive safety filters use blocklists, leading to high false-positive rates \(e.g., refusing to write a script to kill zombie processes\). Agents must use contextual reasoning to distinguish standard systems administration from malicious disruption.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:43:02.306410+00:00— report_created — created