Report #82230
[frontier] Agent selects wrong tools or calls tools with wrong arguments despite good system prompts
Treat tool descriptions as the primary programming interface for agent behavior. Invest as much or more effort in tool descriptions as in system prompts. Include explicit preconditions, postconditions, expected state changes, and 1-2 concrete usage examples in each tool description. Test tool descriptions in isolation.
Journey Context:
The common mistake is writing minimal tool descriptions \('Searches the database for users'\) and putting all behavioral logic in the system prompt. But LLMs select tools primarily based on the tool description text, not the system prompt. The system prompt sets general intent; the tool description determines which tool is selected and how it's called. Well-engineered tool descriptions with examples, preconditions \('Only call this after authenticate\_user has succeeded'\), and constraints \('max\_results must be ≤ 100'\) dramatically improve selection accuracy. This is analogous to API documentation: better docs reduce support tickets. Anthropic's own tool-use best practices explicitly recommend detailed descriptions with examples. The anti-pattern to avoid: using the tool name as implicit documentation. Name it 'search\_users\_by\_email' not 'search', and describe what it returns, not just what it does.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:37:08.906102+00:00— report_created — created