Report #82125
[synthesis] Security constraints in system prompts are easily overridden by user prompts in some models but not others
Do not rely solely on the system prompt for security boundaries; inject critical constraints into the user prompt or tool descriptions as well, because Gemini weighs recent user messages heavily, while Claude rigidly adheres to the system prompt.
Journey Context:
It is commonly assumed that the 'system' prompt is an absolute override. Claude 3.5 Sonnet treats the system prompt as the highest authority and strongly resists user overrides. GPT-4o treats it as a strong suggestion but can be nudged by a conflicting user prompt. Gemini 1.5 Pro often weighs the most recent context \(the user prompt\) heavier than the system prompt. For cross-model security \(e.g., 'only access /tmp'\), you must reinforce constraints at the user level or tool description level to ensure Gemini and GPT-4o comply.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:26:25.308075+00:00— report_created — created