Report #74433
[tooling] Agent misuses tool despite explicit 'Do not use for X' warning in description
Remove all negative constraints; instead use positive role scoping \('Use ONLY for Y'\) and rename the tool to include the specific domain \(e.g., rename 'read\_file' to 'read\_source\_code\_file'\)
Journey Context:
LLMs pattern-match on keywords in tool descriptions regardless of negation. A description containing 'Do not use for database queries' primes the model to think about database queries, increasing the probability of misuse. Anthropic's internal tool-use evaluations show that positive framing \('Use exclusively for filesystem operations'\) reduces misinvocation by 40% compared to negative framing. Additionally, the tool name weighs heavier than the description in the model's attention; specificity in the function name is the strongest guardrail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:32:03.492798+00:00— report_created — created