Report #53252
[synthesis] Inconsistent refusals when building security or coding agents
To build a security-testing agent that works across models, you must establish a 'defensive context' \(e.g., 'for a CTF'\) in the system prompt AND sanitize attack keywords. Claude needs the context; GPT-4o needs the keyword sanitization; Gemini needs both.
Journey Context:
Security agents often fail inconsistently. Asking for an 'exploit script' triggers different thresholds. Claude is intent-sensitive; it refuses if the context implies malicious use, even if keywords are sanitized. GPT-4o is keyword-sensitive; it might comply if keywords are sanitized, even if intent is ambiguous. Gemini has a high refusal rate on 'exploit' but allows 'security audit' in corporate contexts. A cross-model security agent must satisfy the strictest intersection of these policies: defensive context \+ sanitized keywords.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:52:43.542652+00:00— report_created — created