Report #93634
[synthesis] Inconsistent refusals on defensive security code generation
Frame security-related coding prompts with explicit defensive context \('Write a detection rule for...', 'Create a unit test to prevent...'\) and avoid offensive verbs in the prompt.
Journey Context:
GPT-4o's refusal threshold is triggered by intent classification, often ignoring surrounding context. Claude evaluates the whole context but has a hard line on actionable exploits. Gemini 1.5 Pro is more context-aware. To write a Snort rule or YARA signature, framing it as 'detection' or 'prevention' bypasses the refusal triggers across all three, whereas 'write an exploit to test' fails on GPT-4o and Claude.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:45:07.500923+00:00— report_created — created