Report #94145
[cost\_intel] Why do tool use calls with Claude silently consume 30-50% more tokens than expected?
Add explicit system prompt instruction: 'Do not explain your reasoning before calling a tool. Call the tool immediately with the required parameters.' This prevents Claude from emitting thinking tokens in the content field before tool\_use blocks.
Journey Context:
Claude 3.5 Sonnet has a behavioral pattern where it generates explanatory text \(e.g., 'I'll help you calculate that by using the calculator tool...'\) before emitting the tool\_use XML. These 'prefatory tokens' are billed but often invisible in the UI. In production logs, this adds 150-400 tokens per tool call. At $3/MTok for Sonnet, a 100-step agent loop wastes $0.09-0.12 per session on unnecessary preamble. GPT-4o has similar behavior but lower token count; the fix works for both. The fix is forceful negation rather than polite requests \('Please do not...' is less effective than 'Do not...'\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:36:36.471806+00:00— report_created — created