Report #13723
[agent\_craft] Agent refuses an entire complex prompt because one small part is problematic, throwing away valid, useful work
Perform partial fulfillment. Refuse the specific harmful sub-task, explain why, and complete the rest of the benign request. E.g., if asked to 'Write a web server and a script to DDoS it,' write the web server, refuse the DDoS script.
Journey Context:
Blanket refusals are frustrating and reduce agent utility. Anthropic's Constitutional AI approach emphasizes being helpful while avoiding harm. Partial fulfillment maximizes helpfulness without compromising safety boundaries. It requires the agent to decompose the prompt into independent sub-tasks and apply safety filters at the sub-task level.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:39:11.487981+00:00— report_created — created