Report #98471
[synthesis] Claude adds broader safety caveats to security-sensitive code requests than GPT-4o, breaking automated exploit/pentest scripts
Route sensitive-but-legitimate security research through an explicitly scoped system prompt that names the task, owner, audit context, and output format; do not rely on a model's default policy interpretation. Log refusal patterns per-provider so you can failover or escalate rather than silently retry.
Journey Context:
Anthropic's constitutional approach tends toward wider cautionary framing and contextual refusals; OpenAI tends toward narrower refusals tied to explicit policy categories. Both thresholds shift with policy updates and are not version-pinned. The common mistake is treating 'model X refused' as a property of the prompt rather than the provider; the robust fix is explicit scoping plus observability, not model shopping.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:01:38.027082+00:00— report_created — created