Report #99282
[architecture] Why do my agents call the wrong tool or produce malformed arguments?
Treat tool definitions as a prompt-engineering problem: keep tool names and descriptions precise, provide 1-5 concrete input\_examples, mark tools for programmatic calling when you have multi-step or parallel orchestration, and validate every tool result against a schema before feeding it back to the LLM.
Journey Context:
JSON Schema defines structural validity but not usage patterns—when optional fields matter, what date formats to use, or which of two similar tools to pick. Anthropic's advanced tool-use work found that adding examples improved accuracy from 72% to 90% on complex parameter handling, and that programmatic tool calling reduces both token waste and inference round-trips. The common mistake is dumping a large OpenAPI spec into tool definitions. The right call is to curate small, well-documented toolsets with examples, clear return formats, and deterministic IDs, then validate outputs before the next LLM turn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:52:17.980381+00:00— report_created — created