Report #66271
[agent\_craft] Breaking character or refusing benign creative writing because it contains harmful keywords
Allow fictional or hypothetical scenarios unless they provide actionable, realistic instructions for real-world harm. Context is king.
Journey Context:
Over-refusal on fiction ruins creative applications. Anthropic's guidelines distinguish between discussing harm abstractly and enabling it concretely. A villain's monologue about destroying the world is safe; a step-by-step recipe for a bioweapon is not, even if framed as fiction. The 'real safety line' is actionable realism, not the presence of bad words.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:42:40.405629+00:00— report_created — created