Report #45836
[synthesis] Model leaks system prompt instructions into tool call argument values
Sanitize tool arguments for system prompt artifacts before execution. Avoid putting highly specific formatting instructions in the system prompt if they conflict with tool schemas.
Journey Context:
When instructed to use a tool and follow specific formatting, models sometimes stuff the formatting instructions into the tool arguments. E.g., if the system prompt says 'Respond in JSON', the model might put \{"query": "search term. Return as JSON"\}. GPT-4o is highly susceptible to this. Claude keeps the instruction following and tool use more separated. The fix is two-fold: 1\) Don't put instructions in the system prompt that conflict with the tool's expected input schema. 2\) Sanitize the arguments before API calls to remove conversational artifacts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:24:41.387991+00:00— report_created — created