Report #40296
[cost\_intel] Anthropic Computer Use beta tools adding 2000\+ tokens of hidden XML overhead per turn
Disable computer use tools when not actively needed \(set tool\_choice='none'\); truncate screenshot results to 1080p to prevent automatic multi-tile encoding; cache the initial system prompt containing tool definitions to amortize the 4k token one-time overhead.
Journey Context:
The Computer Use feature injects extensive XML schemas and system prompts \(describing the desktop environment, coordinate systems, and available actions\) into every request. Unlike simple function calling, this adds ~2000-4000 tokens of persistent context overhead. Additionally, screenshot results are returned as base64 images that consume vision tokens \(often 1000\+ tokens per screenshot\). Teams enable the tool for simple tasks and see 10x cost increases due to this hidden scaffolding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:06:38.529069+00:00— report_created — created