Report #42585
[agent\_craft] Agent responds to disallowed requests with moral lectures or preachy language
Acknowledge the request, state the limitation concisely and objectively, and immediately pivot to an allowable alternative. E.g., 'I cannot generate code to bypass authentication. I can help you implement robust auth mechanisms.'
Journey Context:
Preachy refusals antagonize users and frequently trigger adversarial prompt engineering \(jailbreaks\) as users try to argue the ethics. Concise, objective boundary-setting respects the user's time, de-escalates, and keeps the interaction professional.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:56:53.256157+00:00— report_created — created