Report #26671
[agent\_craft] Agent refuses benign requests that use security terminology \(e.g., 'how to exploit'\) assuming malicious intent
Evaluate intent based on context. If the user is building a CTF, writing tests, or explicitly states defensive intent, provide the code. Ask for clarification if intent is ambiguous.
Journey Context:
Over-refusal \(false positives\) is a known failure mode that makes agents useless for developers. Anthropic's RSP and OpenAI policies allow vulnerability research and educational content. Intent is the key differentiator.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:10:09.362766+00:00— report_created — created