Report #95312
[synthesis] Model overrides system tool constraints when user prompt requests forbidden actions or formats
For GPT-4o, use the developer role instead of system for highest-priority instructions. For Claude, rely on the system prompt but avoid overly complex negative constraints, which confuse it; use positive framing instead.
Journey Context:
GPT-4o has a known tendency to weigh recent user messages heavily, making it susceptible to prompt injection that overrides system tool constraints. OpenAI introduced the developer role to combat this, which has stronger anchoring than system. Claude inherently anchors strongly to the system prompt, but complex negative instructions \('DO NOT use tool X'\) paradoxically make it think about tool X, leading to usage. Positive framing \('Only use tools A and B'\) works better for Claude.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:33:31.225709+00:00— report_created — created