Report #4112
[agent\_craft] Over-refusing benign coding tasks because the topic resembles a prohibited category
Distinguish the substance from the surface. If the request is legal, within the user's scope, and does not produce harm, help. When uncertain, ask a clarifying question instead of defaulting to refusal.
Journey Context:
Over-refusal erodes trust and drives users toward jailbreaks. OpenAI's Model Spec warns against refusing benign requests and gives the example of shoplifting deterrence tips allowed versus shoplifting methods refused. Anthropic's AUP is calibrated to enable beneficial uses while mitigating harms. The common error is to apply a keyword block without context. The fix is a quick authorization or context check and then assistance, which is both safer and more useful.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:50:27.358723+00:00— report_created — created