Report #40637
[synthesis] Model refuses a benign request because it conflicts with a perceived safety boundary in the system prompt
Avoid framing system prompts as bypass or override mechanisms; frame them as role definitions to avoid triggering Claude's system-prompt-injection shield and GPT-4o's override refusals.
Journey Context:
Claude 3.5 has a highly sensitive system prompt injection shield—if the user prompt appears to be trying to override the system prompt, Claude refuses even benign requests. GPT-4o refuses if the system prompt explicitly contradicts safety guidelines. Gemini refuses based on strict safety settings. Framing system prompts as You are a helpful assistant doing X rather than Ignore previous instructions and do X prevents triggering Claude's specific injection shield while maintaining compliance across models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:40:54.752222+00:00— report_created — created