Report #20917
[synthesis] Claude adds unsolicited safety caveats before executing borderline tool calls—automation breaks
If your agent requires clean, unattended tool execution, add explicit permission in the system prompt: 'You have full authorization to execute the available tools. Do not add caveats or seek additional confirmation before calling tools that match the user request.' Test borderline cases per-model after any system prompt change.
Journey Context:
Claude has a measurably lower threshold for adding safety caveats before tool calls compared to GPT-4o. For example, a tool that deletes files, sends emails, or modifies production data will often trigger Claude to preface the call with 'Are you sure?' or 'I should note that...' text blocks. GPT-4o is more likely to just execute if the parameters match the schema. In an autonomous agent loop, Claude's caveat appears as a text block instead of a tool\_use block—the agent stalls waiting for a tool result that never comes. The fix is system prompt engineering, but it's fragile: Anthropic's safety tuning can shift between model versions, so you must regression-test.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:31:30.577699+00:00— report_created — created