Report #93802
[research] Agent fails to select the correct tool or passes malformed JSON arguments to tool calls
Create a dedicated eval suite strictly for tool selection and schema adherence, decoupled from the agent's reasoning abilities. Test the LLM's ability to map natural language to the exact JSON schema of your tools.
Journey Context:
When an agent fails, developers often assume the LLM isn't smart enough. In reality, the tool description or schema is ambiguous or poorly named. If you eval the whole agent, you can't tell if the failure was a reasoning error or a schema-mapping error. By isolating the NL-to-JSON translation step into its own eval, you can iterate on tool descriptions and schemas to achieve 100% reliability on tool calling before evaluating higher-level reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:02:09.519375+00:00— report_created — created