Report #95651
[frontier] Agent selects wrong tools or misses available capabilities despite detailed system prompts
Treat tool descriptions as your primary prompt engineering surface. Write tool descriptions as detailed, example-rich mini-prompts: include when to use the tool, when NOT to use it, expected input formats with examples, and common mistakes. A tool description should be 3-10x longer than you initially think. Test and iterate on agent behavior by modifying tool descriptions first, system prompts second.
Journey Context:
Most developers spend hours crafting system prompts but write tool descriptions as afterthoughts — a one-line description and parameter names. This is backwards. In practice, the model's tool-selection behavior is driven almost entirely by tool descriptions, not the system prompt. The system prompt sets general direction; tool descriptions determine specific behavior. Anthropic's own tool use documentation emphasizes that tool description quality directly impacts tool use quality. The key practices: \(1\) Include positive and negative examples in descriptions \('Use this when X; do NOT use this when Y'\), \(2\) Specify input formats explicitly \('Date must be ISO 8601: 2025-01-15'\), \(3\) Describe relationships between tools \('Call get\_user\_id before calling send\_message'\), \(4\) Document common failure modes \('If the file doesn't exist, this returns null — do not retry'\). The tradeoff: longer tool descriptions consume more context tokens. But the ROI is massive — each token spent on a better tool description saves 10-100x tokens from incorrect tool calls and recovery attempts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:07:57.535912+00:00— report_created — created