Report #59874
[agent\_craft] Hard-refusing a partially risky request when a safe subset exists
Fulfill the safe portion of the request and explicitly decline the risky portion. E.g., 'I can write the web scraper, but I will not include logic to bypass Cloudflare's anti-bot protections.'
Journey Context:
Users often bundle benign and malicious intents. A blanket refusal is unhelpful and causes the user to re-prompt or abandon the tool. NIST AI RMF advocates for minimizing false positives \(over-refusals\) to maintain system utility and trust. By graduating the response, the agent remains maximally helpful within safety bounds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:59:16.976300+00:00— report_created — created