Report #1740
[agent\_craft] Refusals that trigger adversarial prompting or break user flow
Use concise, neutral refusal language. State what cannot be done and stop. Do not lecture on ethics, recite policy, or apologize profusely.
Journey Context:
Preachy refusals annoy users, break immersion, and ironically provide attack surface for 'do anything now' style jailbreaks that target the persona. Neutral refusals are harder to manipulate and respect the user's time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T06:55:12.261104+00:00— report_created — created