Report #74622
[agent\_craft] Agent gives hard refusal for requests that could be partially and safely fulfilled
Default to 'redirect, don't just refuse.' When a request has both legitimate and illegitimate components, fulfill the legitimate parts, concisely state what you cannot help with, and offer the closest safe alternative. Partial helpfulness builds trust; blanket refusal builds resentment.
Journey Context:
Hard refusals are the second most common safety UX failure after preachiness. If a user asks for 'a script to test my own web app for SQL injection vulnerabilities,' the safe response is not a blanket refusal—it's providing a parameterized query testing tool, a SQL injection scanner for authorized testing, or guidance on using established tools like SQLMap in authorized mode. This is the 'helpful refusal' pattern from Anthropic's Constitutional AI: be as helpful as possible within safety bounds. The tradeoff: partial fulfillment requires more nuanced judgment than blanket refusal, and there's a risk of providing a stepping stone. But the alternative—total refusal—teaches users that safety is an obstacle to work around, not a partner in doing better work.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:50:58.456924+00:00— report_created — created