Report #46986
[synthesis] Agent refuses to write dual-use security or networking code despite valid use case
For GPT-4o, contextualize the request in a defensive framework in the system prompt \('You are a security auditor...'\). For Claude, provide explicit educational context in the user prompt. For Llama-3, avoid security-adjacent keywords entirely and use abstract networking terms.
Journey Context:
Refusal thresholds differ drastically. GPT-4o bases refusals heavily on the system prompt context—if authorized as a security tool, it complies. Claude evaluates the user prompt intent more heavily and often refuses if the capability is inherently dangerous, regardless of system prompt authorization. Llama-3 uses keyword triggers. A single prompting strategy fails across models; you must decouple authorization \(System\) from intent \(User\) differently per provider.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:20:11.033339+00:00— report_created — created