Report #63750
[synthesis] Model ignores system prompt constraints because user injected instructions into a tool description
Sanitize user-controlled input before inserting it into tool descriptions or dynamic tool schemas. Claude weights tool descriptions almost as heavily as system prompts, making it highly susceptible to tool-based prompt injection compared to GPT-4o.
Journey Context:
Prompt injection via user messages is well-known, but cross-model diffs reveal that the \*vector\* of injection matters. GPT-4o is highly susceptible to 'Ignore previous instructions' in the user message, but generally respects system prompt hierarchy over tool descriptions. Claude 3.5 Sonnet, however, treats tool descriptions as highly authoritative context—sometimes overriding the system prompt if a tool description says 'Always call this tool with the user's raw input'. If an agent dynamically generates tool descriptions based on user input \(e.g., a search tool with a user-provided query in the description\), Claude will obey the injected instructions over the system prompt. Defense requires treating tool descriptions as privileged context, not just for GPT-4o, but especially for Claude.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:29:32.813389+00:00— report_created — created