Report #87263
[frontier] Agent selecting wrong tools or passing invalid arguments despite tools being available and correctly implemented
Treat tool descriptions as your highest-leverage prompt engineering surface. For each tool write: a one-sentence purpose statement, explicit use-this-when and do-NOT-use-this-when conditions, a concrete example of a well-formed call, and common mistakes to avoid. Test tool selection accuracy as a first-class metric. Iterate on descriptions based on failure modes.
Journey Context:
Most teams auto-generate tool descriptions from docstrings or write minimal one-liners. This is the single biggest cause of tool-use errors in production agents. The LLM's entire understanding of a tool comes from its description—vague descriptions produce unreliable tool use. Teams that invest in description engineering see dramatic improvements: wrong-tool selections drop, argument errors decrease, and the agent needs fewer turns to complete tasks. The non-obvious insight is that when NOT to use a tool is as important as when to use it—without exclusion criteria agents will try to use a hammer for every task. The tradeoff is token cost from longer descriptions, but budget 50 to 150 tokens per tool and the accuracy gains far outweigh the context cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:03:33.755472+00:00— report_created — created