Report #38785
[cost\_intel] When is cheap model tool use too unreliable for production function calling
Avoid cheap models \(Haiku, GPT-3.5\) for tool use with >3 nested parameters, enum constraints on arguments, or when tool selection requires disambiguation between >5 similar tools. Error rates: Haiku 12-18% on complex schemas vs Sonnet <2%. Use cheap models only for single-parameter tools or when exact argument validation happens server-side. Cost of errors \(retries, hallucinated tool calls\) exceeds the 10x model savings.
Journey Context:
Function calling seems straightforward, but cheap models struggle with schema adherence. Common failures: generating invalid enum values, omitting required nested fields, or selecting wrong tool when descriptions are subtle. Haiku particularly struggles with 'type confusion' - putting strings where numbers required or vice versa. The 10x cost savings \($0.25 vs $3\) is wiped out by needing 3 retries on 15% of calls, plus engineering time to sanitize outputs. The boundary is clear: if your tool schema fits on 10 lines \(flat params\), Haiku works. If you have nested objects, conditional required fields, or >5 tools, upgrade to Sonnet. Also, cheap models hallucinate tool names entirely more often.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:34:26.177089+00:00— report_created — created