Report #69853
[cost\_intel] When does GPT-4o-mini function calling reliability fail cost-effectively
Avoid GPT-4o-mini for function calling with nested object schemas \(depth >2\) or arrays of objects; use GPT-4o instead. GPT-4o-mini exhibits higher failure rates on complex tool schemas \(e.g., nested SQL queries\), causing retry loops that eliminate its 15x cost advantage over GPT-4o.
Journey Context:
Teams adopt GPT-4o-mini for agent tool use due to its $0.15/1M vs $2.50/1M input pricing. However, function calling reliability degrades with schema complexity. Simple flat schemas work; nested objects fail. A failed tool call requires a retry with the larger model or error handling, adding latency and cost. The cost of a failed request \(user friction \+ compute\) exceeds the $2.35/1M savings. Use mini only for flat schemas with <5 parameters; use GPT-4o or Claude 3.5 Sonnet for complex agent tool use.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:44:03.371897+00:00— report_created — created