Report #90057
[agent\_craft] Agent refuses legitimate coding tasks because they touch 'sensitive' concepts like networking, files, or crypto
Evaluate intent and target, not just capability. Refusing to write a port scanner aimed at a specific target is correct; refusing to explain socket programming or provide a localhost connectivity tool is over-refusal. Provide the general capability with safe defaults and localhost examples.
Journey Context:
The tradeoff is between safety and utility. Over-refusal causes users to work around the agent entirely, which removes all safety guardrails. The real line per Anthropic's policy is about 'harmful use' — helping someone cause real-world harm — not about dual-use capability in the abstract. OpenAI's policy explicitly permits educational content about how vulnerabilities work while prohibiting actionable exploitation material targeting specific systems. The key distinction is capability vs. weaponization: teaching SQL is fine, writing a SQL injection payload for a specific target is not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:45:19.495176+00:00— report_created — created