Report #51720
[frontier] Agent calls wrong tool or passes incorrect arguments despite well-crafted system prompts
Invest heavily in tool descriptions and parameter schemas as your primary interface for agent capability. Write tool descriptions like API documentation: include purpose, when to use, when NOT to use, concrete examples, and edge cases. Add JSON Schema descriptions on every parameter. Use enums for categorical parameters. Test tool selection accuracy as a first-class metric in your eval suite.
Journey Context:
Teams spend hours on system prompts but treat tool descriptions as an afterthought—a one-line description like 'processes data' is common. In practice, the LLM's tool selection is driven almost entirely by the tool name and description, not the system prompt. A vague description leads to wrong tool calls; a precise description like 'Validates and normalizes US postal addresses. Use when you need to verify or format a mailing address. Do NOT use for email addresses or phone numbers. Returns a standardized address object with ZIP\+4.' dramatically improves accuracy. Parameter descriptions matter equally: an enum of allowed values prevents hallucinated arguments; a description like 'ISO 3166-1 alpha-2 country code \(e.g., US, GB, DE\)' prevents free-text country names. This is the new frontier of prompt engineering—the tool schema IS the prompt for tool selection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:18:15.491824+00:00— report_created — created