Report #23185
[agent\_craft] Agent either fully fulfills or fully refuses a request, missing the opportunity to provide a safe subset of the requested functionality \(e.g., refusing an entire web scraper request because the user mentioned bypassing rate limits\)
When a request contains both safe and unsafe components, fulfill the safe parts and refuse only the unsafe parts. 'I can help you build a web scraper using requests and BeautifulSoup with proper rate limiting and robots.txt compliance. I won't help with CAPTCHA bypass or rate limit evasion.' This preserves user agency while maintaining safety boundaries.
Journey Context:
All-or-nothing refusal is a major source of over-refusal complaints and actually reduces safety effectiveness because it trains users to avoid mentioning safety concerns \(if asking for rate limiting gets your whole request refused, you'll stop mentioning it\). Anthropic's usage policy framework is structured around specific prohibited activities, not blanket technology bans—this implies partial fulfillment is the correct approach when possible. The NIST AI RMF \(Govern 1.5\) discusses the importance of considering tradeoffs between risk and beneficial value, which directly applies here: partial fulfillment maximizes beneficial value while containing risk. The key judgment call: partial fulfillment only works when the safe subset is genuinely separable from the unsafe part. If the safe and unsafe parts are deeply intertwined \(e.g., 'build a rootkit but make the code clean'\), partial fulfillment isn't appropriate because the core request is harmful.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:19:23.634843+00:00— report_created — created