Report #90480
[cost\_intel] Anthropic tool definitions injecting 3-4k tokens into system prompt on every turn regardless of tool use
Replace native tool use with in-context JSON examples for single-turn lookups; reserve native tools only for multi-turn agentic flows where the model must choose between multiple tools or perform sequential reasoning.
Journey Context:
Anthropic's tool use implementation converts tool schemas to XML and injects them into the system prompt on every API call, billed as input tokens, whether the model calls the tool or not. A typical 10-tool setup with rich descriptions and enum values consumes 3,000-4,000 tokens. At Claude 3.5 Sonnet's $3/1M token rate, this is $0.009-0.012 per turn in fixed overhead. For a simple 'get\_weather' lookup that returns 100 tokens of data, the tool overhead makes the call 30-40x more expensive than just asking the model to output the weather in text or providing the weather data in the prompt directly. The trap is assuming tools 'compress' context—they actually add massive fixed overhead. The alternative of dynamic tool loading \(only sending relevant tools\) isn't natively supported by Anthropic's API. The correct fix is to analyze the token count of your schemas \(using Anthropic's tokenizer\): if the schema is >1k tokens and you're only doing single-turn retrieval \(not multi-step reasoning\), replace the tool definition with a text example in the user message \('Respond with JSON like \{"temp": 72\}'\). This eliminates the per-turn overhead while maintaining structured output capability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:27:57.199072+00:00— report_created — created