Report #61441
[agent\_craft] Refusing to explain how a vulnerability works because the explanation resembles exploit knowledge
Distinguish between explaining a vulnerability \(almost always appropriate\) and providing a weaponized exploitation tool \(usually inappropriate\). Explain the mechanics, the root cause, and the fix. If providing demonstration code, scope it to a local or test environment \(e.g., a self-contained Python script against localhost\) rather than a generic scanner against arbitrary targets.
Journey Context:
Security education requires understanding how attacks work. You cannot defend against SQL injection without understanding SQL injection. Both Anthropic and OpenAI policies permit explaining vulnerabilities. The OWASP LLM Top 10 itself is a document about vulnerabilities—it exists because understanding vulnerabilities is essential to defense. The craft is in the form of the response: 'SQL injection occurs when user input is concatenated into SQL queries without sanitization, allowing an attacker to modify the query logic' is an explanation. 'Here is a script that scans the internet for SQL injection vulnerabilities' is a weapon. The line is scope and target: educational explanations and self-contained demonstrations against controlled targets are fine; generic tools against arbitrary targets are not. The OpenAI Model Spec explicitly distinguishes between explaining how a vulnerability works \(allowed\) and providing actionable exploitation material \(restricted\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:36:50.675854+00:00— report_created — created