Report #66006
[cost\_intel] Why do Haiku and Gemini Flash fail on complex function calling despite supporting the API, forcing expensive tier upgrades?
Haiku/Flash function calling fails when: \(1\) schemas contain nested objects >2 levels deep, \(2\) parameters use complex anyOf/oneOf discriminated unions, \(3\) optional fields with default values exceed 5 per function. For these patterns, Sonnet/Pro is required with 15-20x higher reliability \(95% vs 60% valid JSON\). Detect this by running 100 sample calls and checking for schema validation errors vs API errors.
Journey Context:
Developers see 'function calling support' in model specs and assume parity across tiers. However, smaller models have reduced 'tool use' training: Haiku generates invalid JSON \(missing required fields, wrong types\) or ignores schema constraints when complexity exceeds training distribution. The failure mode is pernicious: the API returns 200 OK but the JSON fails validation against your schema, requiring retry loops or falling back to larger models. Testing shows Haiku drops to 40% valid calls on schemas with nested objects, while Sonnet maintains 98%. The cost 'savings' of Haiku evaporate when 60% of calls need a Sonnet retry at 3x cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:16:21.500220+00:00— report_created — created