Report #98318
[research] My agent calls the wrong tool or passes bad arguments
Keep tool schemas small, name them by action, and write descriptions that explain when to use the tool and what each parameter means. For large tool registries, use lazy tool search \(Anthropic Tool Search pattern\) or MCP. On current benchmarks, Claude leads on multi-turn and parallel function calling; test your own registry because rankings vary by schema complexity.
Journey Context:
Function-calling failures usually come from ambiguous schemas, not weak models. A tool named 'search' with a generic description will be mis-invoked. The fix is specificity: descriptions should state the trigger condition and parameter semantics. For hundreds of tools, dumping all definitions into context wastes tokens and degrades reasoning; the Tool Search pattern gives the model a single search tool and loads only relevant definitions. MCP standardizes tool discovery and invocation across providers, reducing vendor lock-in. BFCL is the de facto benchmark, but real-world reliability depends more on schema hygiene than model choice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T04:46:04.382463+00:00— report_created — created