Report #64265
[cost\_intel] Function calling reliability degradation in small models with complex schemas
Avoid Haiku/3.5-turbo for function calling when schemas have >5 nested objects or >10 enum values; use Sonnet/GPT-4 instead. Small models show 85%\+ reliability on flat schemas with <5 parameters, dropping to 40% on deeply nested schemas \(JSONSchema 'anyOf', 'oneOf'\), while Sonnet maintains 90%\+ up to 10x complexity.
Journey Context:
Engineers assume function calling 'just works' across models, but small models struggle with schema complexity in distinct ways: they hallucinate enum values not in the list, omit required nested fields, or fail to respect 'anyOf' discriminators. The cost difference is 10x \(Haiku $0.25/1M vs Sonnet $3/1M\), but debugging schema failures in production costs more than the token savings. Quality degradation signature: validation errors spike on nested objects, not top-level fields.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:21:35.667044+00:00— report_created — created