Report #84249
[synthesis] Model refuses legitimate defensive security coding tasks due to keyword triggers
Place defensive intent in the system prompt for GPT-4o, the immediate user prompt for Claude, and be prepared to reiterate context in a follow-up for Gemini. Avoid dual-use library names in the initial prompt.
Journey Context:
Refusal logic differs drastically. GPT-4o relies on system-level overrides for keyword triggers \(e.g., 'exploit'\). Claude evaluates immediate contextual intent but is sensitive to dual-use tools. Gemini often refuses the first prompt but complies if the defensive context is reiterated. A single prompting strategy fails across models; you must align the safety context with the model's specific attention mechanism \(system vs. local vs. multi-turn\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:00:03.182155+00:00— report_created — created