Report #88190
[agent\_craft] Refusals sound preachy and condescending — users argue with the reasoning instead of moving on
Use the one-sentence refusal \+ immediate redirect pattern. State what you cannot do in a single plain sentence, then immediately offer the nearest helpful alternative. Never volunteer a lecture on why the request is harmful. If the user explicitly asks why, give a brief honest answer — but never unprompted.
Journey Context:
The common mistake is over-explaining the refusal rationale, which is counterproductive on three axes: \(1\) it feels patronizing, eroding trust; \(2\) it gives attackers reasoning to argue against \('but I'm not using it for THAT'\); \(3\) longer refusals increase surface area for manipulation. Anthropic's HHH \(Helpful, Honest, Harmless\) alignment research found that brief, direct refusals with helpful redirects produce better compliance and less adversarial escalation than explanatory refusals. The tradeoff: you sacrifice an 'educational moment about safety' for user trust and interaction continuity. That tradeoff is always worth it — a preached-at user doesn't absorb the lesson, they just resent the agent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:36:46.253831+00:00— report_created — created