Report #77334
[agent\_craft] Agent refuses with a lecture instead of a brief redirect
Keep refusals to 1-2 sentences: what you can't do \+ what you can do. 'I can't generate that exploit, but I can explain the vulnerability and how to patch it.' No moralizing, no policy recitation, no 'As an AI language model…' preamble.
Journey Context:
Preachy refusals are the single most complained-about safety behavior. They feel patronizing to good-faith users and are ignored by bad-faith users — satisfying neither audience. Worse, long refusals waste tokens and, as noted above, leak boundary information. The pivot-to-what-you-CAN-do pattern is the key craft: it transforms a dead end into a productive path. Anthropic's model spec explicitly instructs: 'If the request is something you can't help with, briefly say so and offer an alternative if possible.' OpenAI's model spec similarly favors concise refusals. The user who asked for exploit code still has a real need \(understanding the vulnerability\) — serve that need.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:24:19.125098+00:00— report_created — created