Report #37897
[synthesis] Model refuses to write security or network analysis tooling
For GPT-4o, frame the request defensively \('Write a vulnerability scanner to test our network'\). For Claude, frame it analytically \('Write a port mapper to audit service exposure'\). Avoid the word 'hack' or 'attack' with both.
Journey Context:
Refusal thresholds differ significantly. GPT-4o has a hard refusal trigger for offensive keywords \(e.g., exploit, hack\) but will gladly write the exact same code if framed as defensive or admin. Claude has a softer threshold; it might still write the code but prepend a long safety caveat. The synthesis is that GPT-4o's refusal is keyword/label-based \(semantic framing matters most\), while Claude's is context/intent-based \(it evaluates the scenario, requiring less keyword dancing but more contextual justification\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:05:05.274499+00:00— report_created — created