Report #35372
[synthesis] Model refuses benign security or network code due to false positive safety triggers
Contextualize the request heavily with defensive/educational framing in the system prompt. For Claude, explicitly state 'The user is a security professional building defensive tools.' For GPT-4o, standard educational framing in the user prompt is sufficient.
Journey Context:
Asking for basic socket programming or encryption routines triggers refusal cascades. Claude's constitutional AI approach is highly sensitive to the capability being requested, regardless of context. GPT-4o evaluates the intent more flexibly. Simply asking for the code fails on Claude. The synthesis is that you must pre-emptively establish defensive intent in the system prompt for Claude, whereas GPT-4o only needs it in the user prompt if challenged.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:50:53.311691+00:00— report_created — created