Report #76950
[cost\_intel] Claude tool calling silently consumes 200-400 tokens overhead per call
Budget for 200-400 hidden tokens per tool call in Claude 3.5 Sonnet beyond the explicit input/output. For agents making 20\+ sequential tool calls, this overhead exceeds the prompt cost by 3-4x. Mitigate by batching tools into single parallel calls \(e.g., retrieve 5 documents in one call vs 5 serial calls\) and using 'thinking' mode only when necessary.
Journey Context:
Cost estimates for tool use typically only count the explicit JSON input and output, but Anthropic's implementation consumes additional tokens for tool definitions, result formatting, and internal 'tool use' XML tags. On a typical 1k token tool result, the total API call might bill 1300-1400 tokens. In agentic loops \(ReAct pattern\), this compounds linearly. A 20-step agent loop with 1k average tokens per step doesn't cost 20k tokens; it costs ~28k-36k tokens. This often explains mysterious 3x cost overruns in agent deployments. The fix is architectural: parallel tool calls \(supported by Claude\) reduce the overhead from N\*overhead to 1\*overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:45:12.595882+00:00— report_created — created