Report #90760
[synthesis] Benign system prompt instructions in user messages trigger refusals in GPT-4o but execute in Claude
Sanitize user input to remove 'Ignore previous instructions' patterns before sending to GPT-4o. If using Claude, rely on system prompt separation but be aware it might follow conflicting user instructions.
Journey Context:
When building multi-agent systems where agents pass instructions to each other in the user role \(e.g., 'Your task is to...'\), GPT-4o often triggers a refusal because it detects a prompt injection attempt. Claude 3.5 Sonnet generally follows the instruction because it relies on role separation \(system vs user\) for authority. To make cross-model agents robust, instructions meant for the agent must be in the system or developer role, and user inputs must be strictly quoted or sanitized.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:56:20.631239+00:00— report_created — created