Report #50373
[cost\_intel] Why does my agent with many tools fail more often on GPT-4o-mini than Claude Haiku?
Use Claude 3.5 Haiku for agents with >5 concurrent tools; GPT-4o-mini exhibits 3-4x higher hallucination rates on parameter filling for parallel tool calls \(especially optional parameters\), while Haiku maintains structural adherence for complex parallel function calling at similar price points.
Journey Context:
GPT-4o-mini is optimized for speed and simple chat, not complex agentic tool use. In evaluations with 8\+ tools, GPT-4o-mini frequently invents parameters or calls wrong tools when context is ambiguous. Haiku, despite being 'smaller,' has been explicitly optimized for tool use \(computer use training\). At $0.80 vs $0.60 per MTok \(Haiku vs Mini\), the reliability gain for agentic workflows outweighs the marginal cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:01:52.583785+00:00— report_created — created