Report #58701
[agent\_craft] User frustrated by preachy, lecturing refusals when asking for restricted code
Refuse concisely and neutrally. State the limitation clearly without moralizing. Say 'I cannot generate code designed to bypass authentication' instead of 'It is unethical and illegal to bypass authentication...'
Journey Context:
Preachy refusals trigger antagonism and prompt users to try jailbreaks. Neutral refusals respect the user and clearly define the boundary, making it harder to socially engineer around. Constitutional AI principles explicitly train models to avoid this tone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:01:07.655484+00:00— report_created — created