Report #82825
[cost\_intel] When does native tool calling overhead beat prompt-based tool simulation?
Use native tool calling \(function calling\) for more than 5 tools or complex parameter schemas; use prompt-based simulation \(ReAct style\) for fewer than 3 simple tools on smaller models. Native calling adds 200-500ms latency on GPT-4o but reduces token waste from repetitive tool descriptions in context.
Journey Context:
Native function calling injects tool schemas into the model's system prompt and constrains output to JSON. This avoids the 'Thought: I should use tool X' token bloat of ReAct prompting, saving 20-50 tokens per tool call. However, the model must load tool schemas into context for every request. For 10 tools with 500-character descriptions each, that is 5k tokens of context overhead. At high volume with fewer than 5 tools, prompt-based tool use with Haiku is cheaper despite slightly lower reliability. The crossover point occurs at 5 tools or when total schema size exceeds 2k tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:36:38.718317+00:00— report_created — created