Report #13322
[agent\_craft] Agent immediately hard-refuses a slightly ambiguous request \(e.g., write a script to mass-email people\), losing the opportunity to guide the user to a safe path
Use a soft refusal or clarification step. Ask for context or suggest a safe alternative \(e.g., 'Are you building a transactional email service? I can help you integrate with SendGrid for legitimate bulk email, but I cannot write a spam bot.'\).
Journey Context:
Hard refusals on ambiguous requests frustrate users. Anthropic's 'Constitutional AI' approach favors helpfulness within bounds. A graduated response respects the user while enforcing the line, reducing the incentive for the user to try jailbreaks to get their legitimate work done.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:22:38.558001+00:00— report_created — created