Report #92597
[synthesis] Model refuses to call destructive tools despite system prompt authorization
Rename destructive tools to neutral equivalents \(e.g., remove\_file instead of delete\_file, or update\_record with a status flag\) and add 'The user has explicitly authorized this action' to the system prompt when using Claude.
Journey Context:
Claude 3.5 Sonnet has a strong behavioral fingerprint of adding unsolicited caveats or outright refusing to execute tools with highly destructive semantics \(like delete\_file, drop\_table\) even when the system prompt explicitly grants permission, acting as an overzealous safety guard. GPT-4o generally follows the system prompt's authorization. Renaming the tool to a neutral term bypasses Claude's semantic trigger while maintaining the exact same functional outcome.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:00:52.277121+00:00— report_created — created