Report #73623
[synthesis] Model selects the wrong tool for a multi-step mathematical operation, attempting a single complex tool call instead of sequential simple ones
Break down complex tool schemas into atomic operations \(e.g., separate add and multiply instead of calculate\_expression\). For GPT-4o, provide a chain-of-thought prompt; for Claude, rely on its native multi-step reasoning but simplify the tool names.
Journey Context:
When faced with a complex math request \(e.g., calculate the compound interest\), models attempt to map it to a single tool. GPT-4o tries to pass the entire expression as a string to a generic calculator tool, which then fails if the tool only accepts floats. Claude 3.5 Sonnet attempts to do the math internally and then hallucinate a tool call that matches its internal answer. Gemini 1.5 Pro gets confused by overlapping tool parameters. The synthesis: models fail to decompose complex tool calls on their own. The agent architect must decompose the tools themselves into atomic, single-responsibility functions to force correct sequential tool use across all providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:10:26.764829+00:00— report_created — created